[go: up one dir, main page]

CA3137185A1 - License content error correction method, apparutus, and system - Google Patents

License content error correction method, apparutus, and system Download PDF

Info

Publication number
CA3137185A1
CA3137185A1 CA3137185A CA3137185A CA3137185A1 CA 3137185 A1 CA3137185 A1 CA 3137185A1 CA 3137185 A CA3137185 A CA 3137185A CA 3137185 A CA3137185 A CA 3137185A CA 3137185 A1 CA3137185 A1 CA 3137185A1
Authority
CA
Canada
Prior art keywords
target
license content
social credit
unified social
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3137185A
Other languages
French (fr)
Inventor
Yuliang Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3137185A1 publication Critical patent/CA3137185A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses to a license content error correction method, apparatus, and system. The method comprises: obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information to replace the administrative division code of registration management agency in the original unified social credit code, verifying modified original unified social credit code; if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, respectively calculating similarity of each content corresponding to license content identification information and target license content information, selecting correct content to output according to similarity. Comparing with prior art, this application can perform error correction on identified unified social credit code and improve the tolerance for identification error of unified social credit code.

Description

LICENSE CONTENT ERROR CORRECTION METHOD, APPARUTUS, AND SYSTEM
Field [0001] The present disclosure relates to computer technology filed, particularly to a license content error correction method, apparatus, and system.
Background
[0002] In financial services, such as the process of an enterprise applying for a loan from a financial institution, the content of the business license provided by the enterprise needs to be entered into the system for subsequent risk management procedures. Using machine to automatically identify the content of the business license instead of manual entry can greatly reduce entry costs and improve entry efficiency.
[0003] The existing system first detects the text content of the business license, and then identifies the detected content. After the identification is completed, using the spatial location and text content information to extract the identification content of the unified social credit code and enterprise name. Then the unified social credit code and enterprise name are used as unique identifiers to query the license content information in the industrial and commercial database, the license content information includes not only the unified social credit code and enterprise name, but also the date of establishment, business period, registered capital, business scope, residence, enterprise type and legal representative, if the query is successful, the license content information in the industrial and commercial database will be obtained.
[0004] The identification error rate of content will increase exponentially as the content length increases.
For the 18-digit unified social credit code, even if the accuracy of a single character is 99%, the overall accuracy is only 0.9918=0.835, the overall accuracy of the identified unified social credit code is low, in addition, when the unified social credit code is identified, incorrect characters will be added before or after the identified unified social credit code due to identification errors, as a result, the identified unified social Date Recue/Date Received 2022-01-04 credit code is greater than 18 digits, the above two situations will lead to the inability to query the license content information from the industrial and commercial database based on the unified social credit code, only the initial identification content of the business license can be stored in the system, which will eventually result in the wrong license content information in system.
Invention Content
[0005] The present application provides a license content error correction method, apparatus and system, this application can perform error correction on identified unified social credit code and improve the tolerance for identification error of unified social credit code.
[0006] The present application provides following methods:
[0007] The first aspect provides a license content error correction method, the method comprises:
[0008] Obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
[0009] If verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information;
[0010] If verification passes, querying in industrial and commercial database according to the modified Date Recue/Date Received 2022-01-04 original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
[0011] Furthermore,
[0012] Before the corresponding administrative division code of target registration management agency is obtained according to the residence of license content identification information, the method also comprises:
[0013] Starting from first digit of the original unified social credit code in the license content identification information, sequentially selecting characters with a preset number of digits as target unified social credit code for verification;
[0014] If verification fails, selecting characters with a preset number of digits from the second digit of the original unified social credit code as target unified social credit code for verification, if verification fails, repeating execution until the selected preset digits of characters are verified or verified to last preset number of characters of the original unified social credit code.
[0015] Furthermore,
[0016] Obtaining corresponding administrative division code of target registration management agency according to residence in license content identification information, and the target administrative division code of registration management agency is used to replace the administrative division code of registration management agency in the original unified social credit code in the license content identification information, verifying the modified original unified social credit code, comprising:
[0017] If characters of last preset number of digits of the original unified social credit code are not verified, Date Recue/Date Received 2022-01-04 then the characters of the preset number of digits are selected in order from the first digit of the original unified social credit code as the target unified society credit code, obtaining corresponding administrative division code of target registration management agency based on residence in the license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the target unified social credit code, at the same time modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social credit code;
[0018] If verification fails, selecting predetermined number of characters from second digit of the original unified social credit code as the target unified social credit code, using the administrative division code of the target registration management agency to replace the administrative division code of the registration management agency in the target unified social credit code, at the same time, modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social credit code, if the verification fails, repeating the execution until the modified target unified social credit code is verified or verified to the modified target unified social credit code composed of last preset number of characters of the original unified social credit code.
[0019] Furthermore,
[0020] If the verification fails, querying in the industrial and commercial database according to the modified unified social credit code, and obtaining the corresponding target license content information, comprising:
[0021] If the modified target unified social credit code composed of last preset digits of the original unified social credit code is verified, and the verification fails, the characters of last preset digit of the original unified social credit code are used as the target unified social credit code, and the administrative division code of the target registration management agency is used to replace the administrative division code of the Date Recue/Date Received 2022-01-04 registration management agency in the target unified social credit code, according to the modified target unified social credit code, performing an accurate query in the industrial and commercial database to obtain the corresponding target license content information;
[0022] If the target license content information is not found in the query, the fuzzy query engine is used to query in the industrial and commercial database according to preset edit distance to obtain the corresponding target license content information.
[0023] Furthermore, the method also comprises:
[0024] Respectively calculating each content similarity corresponding to the license content identification information and the target license content information;
[0025] If the similarity is greater than preset threshold, selecting the content of the target license content information to use, if the similarity is less than preset threshold, selecting the content of the license content identification information to use.
[0026] Furthermore, respectively calculating each content similarity corresponding to the license content identification information and the target license content information, comprising:
[0027] Respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information.
[0028] Furthermore, respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information, Date Recue/Date Received 2022-01-04 comprising:
[0029]
Using edit distance algorithm to respectively calculate the similarity of corresponding establishment date, business period, and registered capital corresponding to the license content identification information and the target license content information;
[0030] Using Jaccard distance algorithm to calculate the similarity of business scope and residence corresponding to the license content identification information and the target license content information;
[0031] After removing the fixed characters in the enterprise name and enterprise type corresponding to the license content identification information and the target license content information, using Jaccard distance algorithm to calculate the similarity of enterprise name and enterprise type corresponding to the license content identification information and the target license content information;
[0032] Using preset font similarity algorithm to calculate the similarity of legal representative corresponding to the license content identification information and the target license content information;
[0033] Furthermore, the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information respectively corresponds to different preset threshold.
[0034] The second aspect of present application provides a license content error correction apparatus, wherein, the apparatus comprises:
[0035] A verification unit for obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the Date Recue/Date Received 2022-01-04 administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
[0036] A first obtaining unit for if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information;
[0037] A second obtaining module for if verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
[0038] The third aspect of the present application provides a computer system, the system comprises:
[0039] One or a plurality of processors; and
[0040] A memory associated with one or a plurality of processors, the memory is configured to store program commands if the program commands are executed by one or a plurality of processors.
[0041] According to the specific implementations provided in this application, this application discloses the following technical effects: the 9th to 17th digits of the unified social credit code are the main body identification code (organization code), and the 17th digit is used to verify the main body identification code, the 18th digit of the unified social credit code is the check code, which is used to verify the unified social credit code, therefore, the last two digits of the unified social credit code can be directly used to verify the unified social credit code, and the corresponding administrative district code of the target registration management agency can be obtained according to the residence, using the administrative division code of Date Recue/Date Received 2022-01-04 the target registration management agency to replace the administrative division code of the registration management agency in the original unified social credit code, and verifying the modified original unified social credit code; To avoid the error of the administrative division code of the registration management agency in the identified unified social credit code, which may cause the identified unified social credit code to fail verification; after modifying the original unified social credit code and failing to pass the verification, querying in the industrial and commercial database according to the modified original unified social credit code to obtain the corresponding target license content information, since the administrative division code of the registration management agency is correct, searching can improve the accuracy of the search and increase the tolerance for identification errors of the unified social credit code.
Drawing Description
[0042] In order to describe the technical solutions clearer in the implementations of the present application or the prior art, the following are drawings that need to be used are briefly introduced.
Obviously, the drawings in the following description are only some implementations of the application, for those of ordinary skill in the art , without creative work, they can also obtain other drawings based on these drawings.
[0043] Figure 1 is a process diagram of license content error correction method in implementation 1 of the present application;
[0044] Figure 3 is a structural diagram of apparatus in implementation 2 of the present application.
[0045] Figure 4 is a structural diagram of computer system in implementation 3 of the present application.
Specific implementation methods Date Recue/Date Received 2022-01-04
[0046] The following will describe the technical solutions of the implementations in the present application with accompanying drawings, obviously the described implementations are only a part of the implementations in the present application. Based on the implementations in the present application, all other implementations obtained by those of ordinary skilled in the art will fall in the protection scope of the present application.
[0047] As described in the background, for the 18-digit unified social credit code, even if the accuracy of a single character is 99%, the overall accuracy rate is only 0.9918=0,835, and the overall accuracy of the identified unified social credit code is low, in addition, when the unified social credit code is identified, incorrect characters will be added before or after the identified unified social credit code due to identification errors, as a result, the identified unified social credit code is greater than 18 digits, the above two situations will lead to the inability to query the license content information from the industrial and commercial database based on the unified social credit code, only the initial identification content of the business license can be stored in the system, which will eventually result in the wrong license content information in system.
[0048] For this purpose, this application proposes a method for license content error correction, the 9th to 17th digits of the unified social credit code are the main body identification code (organization code), and the 17th bit is used to verify the main body identification code, the 18th digit of the unified social credit code is a check code, which is used to verify the unified social credit code, so the last two digits of the unified social credit code can be used directly to verify the unified social credit code; the corresponding administrative district code of the target registration management agency can be obtained according to the residence, using the administrative division code of the target registration management agency to replace the administrative division code of the registration management agency in the original unified social credit code, and verifying the modified original unified social credit code; To avoid the error of the administrative division code of the registration management agency in the identified unified social credit code, which may cause the identified unified social credit code to fail verification; after modifying the original unified social Date Recue/Date Received 2022-01-04 credit code and failing to pass the verification, querying in the industrial and commercial database according to the modified original unified social credit code to obtain the corresponding target license content information, since the administrative division code of the registration management agency is correct, searching can improve the accuracy of the search and increase the tolerance for identification errors of the unified social credit code.
[0049] Implementation one
[0050] The implementations of the present application provide a method for correcting license content error. The method is applied to a license content error correction apparatus as an example. The apparatus can be configured in any computer device so that the computer device can execute license content error correction method.
[0051] As shown in Figure 1, the above-mentioned method comprises:
[0052] S11, obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
[0053] The identification information of the license content is obtained by identifying the text content of the business license, therefore, the identification information of the license content are not the same accurate, the unified social credit code, residence, and enterprise name included in it may be wrong, especially after that, it is necessary to use the unified social credit code and enterprise name as the unique identifier to query the license content information in the industrial and commercial database, and it is necessary to ensure the correctness of the unified social credit code. For example, suppose that the original Date Recue/Date Received 2022-01-04 unified social credit code identified is 91333300MA0K75990X, and the corresponding target registration management agency administrative division code is obtained according to the residence, the residence is Taiyuan City, Shanxi Province, the city of the residence is Taiyuan City, and the Taiyuan administrative division code is 140100, using 140100 to replace the administrative division code 333300 of the registration management agency in the identified original unified social credit code, the modified original unified social credit code is 91140100MA0K75990X, verifying the modified original unified social credit code, the 9th to 17th digits are the main body identification code (organization code), the 17th digit is used to verify the identification code of the subject, and the 18th digit of the unified social credit code is the check code, which is used to verify the unified social credit code, therefore, the last two digits of the unified social credit code can be directly used to verify the identity of the entity, the unified social credit code is verified, and the correctness of the unified social credit code is judged.
[0054] S12, if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information;
[0055] If the modified original unified social credit code still fails the verification, querying in the industrial and commercial database according to the modified original unified social credit code to obtain the corresponding target license content information, and the target license content information includes unified social credit code, residence, enterprise name and enterprise basic information. The basic information of the company includes the date of establishment, business period, registered capital, business scope, enterprise type and legal representative.
[0056] S13, if verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.

Date Recue/Date Received 2022-01-04
[0057] If the verification passes, it means that the modified original unified social credit code is correct, according to the modified original unified social credit code and the enterprise name in the license content identification information, querying in the industrial and commercial database to obtain the corresponding target license content information, querying in the industrial and commercial database is to perform an accurate search first, under normal circumstances, it can be searched, however, the unified social credit code that has been verified still has a very small probability of being wrong, therefore, if the precise query cannot find out, the fuzzy query engine is used to search in the industrial and commercial database according to the preset editing distance to obtain the corresponding target license content information.
[0058] Before the corresponding administrative division code of target registration management agency is obtained according to the residence of license content identification information, the method also comprises:
[0059] Starting from first digit of the original unified social credit code in the license content identification information, sequentially selecting characters with a preset number of digits as target unified social credit code for verification;
[0060] If verification fails, selecting characters with a preset number of digits from the second digit of the original unified social credit code as target unified social credit code for verification, if verification fails, repeating execution until the selected preset digits of characters are verified or verified to last preset number of characters of the original unified social credit code.
[0061] When identifying the unified social credit code, due to identification errors, wrong characters will be added before or after the recognized original unified social credit code, causing the recognized original unified social credit code to be greater than 18 digits, therefore, it is necessary to sequentially select characters with a preset number of digits from the first digit of the original unified social credit code as the Date Recue/Date Received 2022-01-04 target unified social credit code for verification, since the current unified social credit code has 18 digits, the number of characters is preset 18 characters, if the verification is not passed, then 18 characters are selected in order from the second digit of the original unified social credit code for verification, if the verification is still not passed, 18 characters are selected for verification in turn, until the selected 18 characters are verified, or selecting the last 18 characters of the original unified social credit code and verifying it.
[0062] Obtaining corresponding administrative division code of target registration management agency according to residence in license content identification information, and the target administrative division code of registration management agency is used to replace the administrative division code of registration management agency in the original unified social credit code in the license content identification information, verifying the modified original unified social credit code, comprising:
[0063] If characters of last preset number of digits of the original unified social credit code are not verified, then the characters of the preset number of digits are selected in order from the first digit of the original unified social credit code as the target unified society credit code, obtaining corresponding administrative division code of target registration management agency based on residence in the license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the target unified social credit code, at the same time modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social credit code;
[0064] If verification fails, selecting predetermined number of characters from second digit of the original unified social credit code as the target unified social credit code, using the administrative division code of the target registration management agency to replace the administrative division code of the registration management agency in the target unified social credit code, at the same time, modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social Date Recue/Date Received 2022-01-04 credit code, if the verification fails, repeating the execution until the modified target unified social credit code is verified or verified to the modified target unified social credit code composed of last preset number of characters of the original unified social credit code.
[0065] If the verification fails when the last 18 characters of the original unified social credit code are verified, then 18 characters are selected in order from the first digit of the original unified social credit code as the target unified social credit code, obtaining the corresponding administrative division code of the target registration management agency according to the residence, and using the administrative division code of the target registration management agency to replace the administrative division code of the registration management agency in the target unified social credit code, for example, suppose the correct unified social credit code is 9 11 4 0 1 0 0 MA 0 K 7 5 9 9 0 X, but the identified unified social credit code is 0091333300MAOK75990X, selecting 18 digits from the first digit 0091333300MAOK7599, use 140100 replaces 913333, and at the same time, the first two digits of the target unified social credit code are modified to preset value, since the first two digits of the unified social credit code can only be 91, 92, 93, and 99, they are modified to 91, 92, 93 and 99, that is, modifying the target unified social credit code to 9114010000MA0K7599, if the verification fails, then modifying it to 9214010000MA0K7599, if the verification fails, then modifying it to 9314010000MAOK7599, if the verification fails, then modifying it to 9914010000MA0K7599, in here, it is not limited to modify the first two digits of the target unified social credit code to 91, it can also randomly select one of them for each modification until the modified target unified social credit code is verified or all four are selected;
[0066] The modified target unified social credit code failed the verification, and 18 digits were selected in order from the second digit of the original unified social credit code as the target unified social credit code 091333300MA0K75990, and 140100 is also used to replace 913333, at the same time, the first two digits of the target unified social credit code are modified to 91, 92, 93 and 99, and the modified target unified social credit codes are verified respectively, if the verification is not passed, starting from the third digit of the original unified social credit code, select 18 digits as the target unified social credit code Date Recue/Date Received 2022-01-04 91333300MA0K75990X, and replacing 913333 with 140100, when the first two digits of the target unified social credit code are modified to 91, the target unified social credit code 91140100MA0K75990X passes the verification, if the modified target unified social credit code still fails the verification, since 91140100MA0K75990X is the last 18 digits of the unified social credit code, this is the end of verification.
[0067] If the verification fails, querying in the industrial and commercial database according to the modified unified social credit code, and obtaining the corresponding target license content information, comprising:
[0068] If the modified target unified social credit code composed of last preset digits of the original unified social credit code is verified, and the verification fails, the characters of last preset digit of the original unified social credit code are used as the target unified social credit code, and the administrative division code of the target registration management agency is used to replace the administrative division code of the registration management agency in the target unified social credit code, according to the modified target unified social credit code, performing an accurate query in the industrial and commercial database to obtain the corresponding target license content information;
[0069] If the target license content information is not found in the query, the fuzzy query engine is used to query in the industrial and commercial database according to preset edit distance to obtain the corresponding target license content information.
[0070] If it always has been verified to the modified target unified social credit code composed of the last 18 digits of the original unified social credit code, and the verification still fails, based on the modified target unified social credit code composed of the last 18 digits of the original unified social credit code, an accurate query is performed in the industrial and commercial database, if the query cannot find out, using the fuzzy query engine to search in the industrial and commercial database according to the preset edit distance, the fuzzy query engine used in this application is ElasticSearch, but it is not limited to this fuzzy Date Recue/Date Received 2022-01-04 search engine, the preset edit distance is set to 5, at this time, 76.47% of the identification errors can be covered, so as to accurately obtain the corresponding target license content information; if the target license content information is still found out, the word segmentation query is performed in the industrial and commercial database according to the enterprise name to obtain the corresponding target license content information.
[0071] The method also comprises:
[0072]
S14, respectively calculating each content similarity corresponding to the license content identification information and the target license content information;
[0073] S15, if the similarity is greater than preset threshold, selecting the content of the target license content information to use, if the similarity is less than preset threshold, selecting the content of the license content identification information to use.
[0074] Since the license content information in the industrial and commercial database is not updated in time, after the information on the business license is changed, the corresponding license content information stored in the industrial and commercial database is not changed, which results in the license content identification information being accurate, but the license content information in the industrial and commercial database is inaccurate, for this reason, the similarity of each content corresponding to the license content identification information and the target license content information is calculated separately, if the similarity is greater than the preset threshold, that is, the similarity is very high, the identification is considered inaccurate, for example, the identification is three, but the license content information is king, and the similarity is high, it is considered an identification error, the king in the license content information is used as the correct content.
[0075] Respectively calculating each content similarity corresponding to the license content identification Date Recue/Date Received 2022-01-04 information and the target license content information, comprising:
[0076] Respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information.
[0077] Respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information, comprising:
[0078]
Using edit distance algorithm to respectively calculate the similarity of corresponding establishment date, business period, and registered capital corresponding to the license content identification information and the target license content information;
[0079] Using Jaccard distance algorithm to calculate the similarity of business scope and residence corresponding to the license content identification information and the target license content information;
[0080] After removing the fixed characters in the enterprise name and enterprise type corresponding to the license content identification information and the target license content information, using Jaccard distance algorithm to calculate the similarity of enterprise name and enterprise type corresponding to the license content identification information and the target license content information;
[0081] Using preset font similarity algorithm to calculate the similarity of legal representative corresponding to the license content identification information and the target license content information;
[0082] For the characteristics of the nine fields, such as length, character composition, whether there is a fixed part, etc., four similarity calculation methods are applied, and the unified social credit code, Date Recue/Date Received 2022-01-04 establishment date, business period and registered capital are long numbers or letters, using the edit distance algorithm to calculate the similarity, the business scope and residence are long Chinese character strings, and using the Jaccard distance algorithm to calculate the similarity, the enterprise name and enterprise type will basically contain fixed characters such as "limited liability company"
and "limited liability company", after removing the fixed characters, the Jaccard distance algorithm is used to calculate the similarity, the legal representative is a short Chinese character string, so the preset font similarity algorithm is used to calculate the similarity.
[0083] The similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information respectively corresponds to different preset threshold.
[0084] Implementation two
[0085] The license content error correction apparatus as shown in Figure 2, wherein, the apparatus comprises:
[0086] A verification unit 21 for obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
[0087] The identification information of the license content is obtained by identifying the text content of the business license, therefore, the identification information of the license content are not the same accurate, the unified social credit code, residence, and enterprise name included in it may be wrong, especially after that, it is necessary to use the unified social credit code and enterprise name as the unique Date Recue/Date Received 2022-01-04 identifier to query the license content information in the industrial and commercial database, and it is necessary to ensure the correctness of the unified social credit code. For example, suppose that the original unified social credit code identified is 91333300MA0K75990X, and the corresponding target registration management agency administrative division code is obtained according to the residence, the residence is Taiyuan City, Shanxi Province, the city of the residence is Taiyuan City, and the Taiyuan administrative division code is 140100, using 140100 to replace the administrative division code 333300 of the registration management agency in the identified original unified social credit code, the modified original unified social credit code is 91140100MA0K75990X, verifying the modified original unified social credit code.
[0088] A first obtaining unit 22 for if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information;
[0089] If the modified original unified social credit code still fails the verification, the first obtaining unit 22 queries in the industrial and commercial database according to the modified original unified social credit code to obtain the corresponding target license content information, and the target license content information includes the unified social credit code, residence, enterprise name, and basic enterprise information, the basic enterprise information includes the date of establishment, business period, registered capital, business scope, enterprise type, and legal representative.
[0090] A second obtaining module 23 for if verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
[0091] If the verification passes, indicating that the modified unified social credit code is correct, the second obtaining unit 23 queries in the industrial and commercial database according to the modified Date Recue/Date Received 2022-01-04 original unified social credit code and the enterprise name in the license content identification information to obtain correspondingly target license content information.
[0092] The implementation of the application provides a license content error correction apparatus, which belongs to the same application concept as the license content error correction method provided in the implementation of the application and can execute the license content error correction method provided in the implementation of the application and is capable of executing correspondingly functional modules and beneficial effects of the method for correcting license content. For technical details that are not described in detail in this implementation, please refer to the method for correcting license content error provided in the implementation of this application and will not be repeated here.
[0093] Implementation three
[0094] Corresponding to the above methods and apparatus, a computer system is provided, the system comprises:
[0095] One or a plurality of processors; and
[0096] A memory associated with one or a plurality of processors, the memory is configured to store program commands, if the program commands are executed by one or a plurality of processors, executing following operations:
[0097] Obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
Date Recue/Date Received 2022-01-04
[0098] If verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information;
[0099] If verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
[0100] Wherein, Figure 3 exemplarily shows the architecture of the computer system , which can specifically include a processor 1510, video display adapter 1511, disk driver 1512, input/output interface 1513, network interface 1514, and memory 1520. The above-mentioned processor 1510, video display adapter 1511, disk driver 1512, input/output interface 1513, network interface 1514 and memory 1520 can be connected through a communication bus 1530.
[0101] Wherein, the processor 1510 can be achieved by using a general CPU
(Central Processing Unit), Microprocessor, Application Specific Integrated Circuit (ASIC) , or one or more integrated circuits, which are used to execute some relative program to achieve the technical solutions provided in this application.
[0102]
The memory 1520 can adopt ROM (Read Only Memory), RAM (Random Access Memory), static storage devices and dynamic storage devices to achieve. The memory 1520 can store operate system 1521 used to control the running of the computer system 1500, used to control the low-level operation of the computer system 1500's Basic Input Output System (BIOS) 1522.
In addition, storing a web browser 1523, data storage management 1524, and device identity information processing system 1525 and so on. The above-mentioned device identity information processing system 1525 can be the specific application that implements the above-mentioned steps. To sum up, when achieving the Date Recue/Date Received 2022-01-04 technical solutions provided by this application through software or firmware, related program codes are stored in the memory 1520 and executed by a processor 1510. Input / output interface 1513 is used for connecting input / output modules to achieve the information input and output.
Input / output module can be configured in the device as a component ( not shown in the figure ), or it can be connected to the device to provide corresponding functions. Wherein, Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, lights and so on.
[0103] Input / output interface 1513 is used for connecting input / output modules to achieve the information input and output. Input / output module can be configured in the device as a component ( not shown in the figure), or it can be connected to the device to provide corresponding functions. Wherein, Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, lights and so on.
[0104] The network interface 1514 is used to connect a communication module (not shown in the figure) to achieve the communication interaction between this device and other devices. Wherein, the communication module can achieve communication through wired means (such as USB , network cable, etc. ) , or through wireless methods ( such as mobile network, WIFI, Bluetooth, etc.) to achieve communication.
[0105] The bus 1530 includes a path and transmits information among various components of the device (such as the processor 1510 , the video display adapter 1511, the disk driver 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
[0106] In addition, the electronic device 1500 also can obtain information with specific receiving conditions from the virtual resource object's receiving condition information database 1541 for condition judgement and son on.

Date Recue/Date Received 2022-01-04
[0107] What should be noted is that although the above device only shows the processor 1510 , the video display adapter 1511, the disk driver 1512, input/output interface 1513, network interface 1514, memory 1520, bus 1530, etc., but in the process of the specific implementation, the device may also include other essential components for normal operation. In addition, those skilled in the art can understand that the above apparatus can comprise only the essential components of the present application to achieve the implementation, but there is no need to contain all the components as shown in figure.
[0108] Known from the description of the above implementations that those skilled in the art can clearly understand that the application can be achieved with the help of software and essential general hardware platform. Based on this understanding, the essence of the technical solution of this application, or in other words, the part that contributes to the existing technology can be implemented in the form of a software product, the computer software product can be stored in storage media, such as ROM/RAM, magnetic disks, optical disks, etc., including several commands to make a computer device (can be a personal computer, a cloud server, or a network device, etc.) to execute the methods described in each implementation or some of the implementations of the present application.
[0109] The various implementations in this description are described in a progressive manner, the same and similar parts among the various implementations can be referred to each other separately, and each implementation focuses on the differences compared with the other implementations. Especially for the concern of the system or the system implementations, since it is basically similar to the implementation method, the description is relatively simple. For related details, please refer to the implementation method. The system and system implementations described in the above are only illustrative, and the units described by separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, which means, it can be in one place, or it may be distributed to a plurality of network units. Some or all the modules are selected according to actual needs to achieve the implementation's solution purpose. The ordinary skill in the art can understand and implement without Date Recue/Date Received 2022-01-04 creative work.
[0110] The above-mentioned are only preferred implementations of the present invention, but not used to restrict the present invention, anything in the spirits and the principles of the present invention, then any modifications, equivalent replacements, improvements shall be included in the protection scope of the present invention.

Date Recue/Date Received 2022-01-04

Claims (10)

Claims:
1. A license content error correction method, wherein, the method comprises:
obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information; and if verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
2. The license content error correction method according to claim 1, wherein, before the corresponding administrative division code of target registration management agency is obtained according to the residence of license content identification information, the method also comprises:
starting from first digit of the original unified social credit code in the license content identification information, sequentially selecting characters with a preset number of digits as target unified social credit code for verification; and if verification fails, selecting characters with a preset number of digits from the second digit of the original unified social credit code as target unified social credit code for verification, if verification fails, repeating execution until the selected preset digits of characters are verified or verified to last preset number of characters of the original unified social credit code.
3. The license content error correction method according to claim 2, wherein, obtaining corresponding administrative division code of target registration management agency according to residence in license content identification information, and the target administrative division code of registration management agency is used to replace the administrative division code of registration management agency in the original unified social credit code in the license content identification information, verifying the modified original unified social credit code, comprising:
if characters of last preset number of digits of the original unified social credit code arc not verified, then the characters of the preset number of digits are selected in order from the first digit of the original unified social credit code as the target unified society credit code, obtaining corresponding administrative division code of target registration management agency based on residence in the license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the target unified social credit code, at the same time modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social credit code; and if verification fails, selecting predetermined number of characters from second digit of the original unified social credit code as the target unified social credit code, using the administrative division code of the target registration management agency to replace the administrative division code of the registration management agency in the target unified social credit code, at the same time, modifying the first two digits of the target unified social credit code to preset value, then verifying the modified target unified social credit code, if the verification fails, repeating the execution until the modified target unified social credit code is verified or verified to the modified target unified social credit code composed of last preset number of characters of the original unified social credit code.
4. The license content error correction method according to claim 3, wherein, if the verification fails, querying in the industrial and commercial database according to the modified unified social credit code, and obtaining the corresponding target license content information, comprising:
if the modified target unified social credit code composed of last preset digits of the original unified social credit code is verified, and the verification fails, the characters of last preset digit of the original unified social credit code are used as the target unified social credit code, and the administrative division code of the target registration management agency is used to replace the administrative division code of the registration management agency in the target unified social credit code, according to the modified target unified social credit code, performing an accurate query in the industrial and commercial database to obtain the corresponding target license content information; and if the target license content information is not found in the query, the fuzzy query engine is used to query in the industrial and commercial database according to preset edit distance to obtain the corresponding target license content information.
5. Any license content error correction method according to claim 1 to 4, wherein, comprising:
respectively calculating each content similarity corresponding to the license content identification information and the target license content information; and if the similarity is greater than preset threshold, selecting the content of the target license content information to use, if the similarity is less than preset threshold, selecting the content of the license content identification information to use.
6. The license content error correction method according to claim 5, wherein, respectively calculating each content similarity corresponding to the license content identification information and the target license content information, comprising:
respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information.
7. The license content error correction method according to claim 6, wherein, respectively calculating the similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information, comprising:
using edit distance algorithm to respectively calculate the similarity of corresponding establishment date, business period, and registered capital corresponding to the license content identification information and the target license content information;
using Jaccard distance algorithm to calculate the similarity of business scope and residence corresponding to the license content identification information and the target license content information;
after removing the fixed characters in the enterprise name and enterprise type corresponding to the license content identification information and the target license content information, using Jaccard distance algorithm to calculate the similarity of enterprise name and enterprise type corresponding to the license content identification information and the target license content information; and using preset font similarity algorithm to calculate the similarity of legal representative corresponding to the license content identification information and the target license content information;
8. The license content error correction method according to claim 7, wherein, the method comprises:
The similarity of establishment date, business period, registered capital, business scope, residence, enterprise name, enterprise type, and legal representative corresponding to the license content identification information and the target license content information respectively corresponds to different preset threshold.
9. The license content error correction apparatus, wherein, the apparatus comprises:
a verification unit for obtaining corresponding administrative division code of target registration management agency according to residence of license content identification information, using the administrative division code of target registration management agency to replace the administrative division code of registration management agency in the original unified social credit code in license content identification information, verifying modified original unified social credit code;
a first obtaining unit for if verification fails, querying in industrial and commercial database according to the modified original unified social credit code to obtain corresponding target license content information, the target license content information includes unified social credit code, residence, enterprise name, and enterprise basic information; and a second obtaining module for if verification passes, querying in industrial and commercial database according to the modified original unified social credit code and enterprise name in license content identification information to obtain corresponding target license content information.
10. A computer system, wherein, the system comprises:
one or a plurality of processors; and a memory associated with one or a plurality of processors, the memory is configured to store program commands, if the program commands are executed by one or a plurality of processors, executing any method according to claim 1 to 8.
CA3137185A 2020-10-30 2021-11-01 License content error correction method, apparutus, and system Pending CA3137185A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011190891.3A CN112364630B (en) 2020-10-30 2020-10-30 Certificate content error correction method, device and system
CN202011190891.3 2020-10-30

Publications (1)

Publication Number Publication Date
CA3137185A1 true CA3137185A1 (en) 2022-04-30

Family

ID=74513119

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3137185A Pending CA3137185A1 (en) 2020-10-30 2021-11-01 License content error correction method, apparutus, and system

Country Status (2)

Country Link
CN (1) CN112364630B (en)
CA (1) CA3137185A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677694A (en) * 2022-03-30 2022-06-28 深圳市福流网络信息科技有限公司 A customs clearance method with intelligent identification technology

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004302863A (en) * 2003-03-31 2004-10-28 Japan Research Institute Ltd Data management device, data management method, and program for causing computer to execute the method
CN108399405B (en) * 2017-02-07 2023-06-27 腾讯科技(上海)有限公司 Business license identification method and device
CN109271611B (en) * 2018-09-06 2023-05-12 创新先进技术有限公司 Data verification method and device and electronic equipment
KR102011694B1 (en) * 2019-03-08 2019-08-19 사회보장정보원 Public institutional income property linkage data verification system and its recording medium
CN109961324A (en) * 2019-03-19 2019-07-02 山东浪潮云信息技术有限公司 A kind of electric business enterprise stamps the standardization processing method and system of region label
CN110309182B (en) * 2019-06-18 2023-08-25 中国平安财产保险股份有限公司 Information input method and device
CN110399829A (en) * 2019-07-23 2019-11-01 上海秒针网络科技有限公司 Certificate information comparison method, device, storage medium and electronic device
CN110633345B (en) * 2019-08-16 2023-04-11 创新先进技术有限公司 Method and system for identifying enterprise registration address
CN110503337A (en) * 2019-08-26 2019-11-26 付强 A kind of business strategy small watersheds, method
CN110765773A (en) * 2019-10-31 2020-02-07 北京金堤科技有限公司 Address data acquisition method and device

Also Published As

Publication number Publication date
CN112364630B (en) 2024-09-24
CN112364630A (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN110221959B (en) Application program testing method, device and computer readable medium
US11010287B1 (en) Field property extraction and field value validation using a validated dataset
CN113434542B (en) Data relationship identification method, device, electronic equipment and storage medium
CN110474900B (en) Game protocol testing method and device
CN113094625B (en) Page element positioning method and device, electronic equipment and storage medium
US11609897B2 (en) Methods and systems for improved search for data loss prevention
US9454561B2 (en) Method and a consistency checker for finding data inconsistencies in a data repository
KR20190095099A (en) Transaction system error detection method, apparatus, storage medium and computer device
CN114490692A (en) Data verification method, device, equipment and storage medium
CN112686759B (en) Account reconciliation monitoring method, device, equipment and medium
CN119088791A (en) Data verification method, device, electronic equipment, medium and product
CN112131100B (en) A front-end and back-end verification method and device based on metadata
CN113792274A (en) Information management method, management system and storage medium
US11182375B2 (en) Metadata validation tool
CA3137185A1 (en) License content error correction method, apparutus, and system
CN112948400A (en) Database management method, database management device and terminal equipment
CN111159482A (en) Data verification method and system
CN111078677A (en) Data entry method and device
CN117370160A (en) Code auditing method, apparatus, computer device, storage medium and program product
CN113434359B (en) Data traceability system construction method and device, electronic device and readable storage medium
CN113486194B (en) A method and device for preventing duplication in knowledge graphs
CN111241082B (en) Data correction method and device
CN115840748A (en) Data processing method, system and related equipment
CN114255134A (en) Account number disassembling method and device and storage medium
CN112651825A (en) Information verification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916