[go: up one dir, main page]

CN111488741A - Tax knowledge data semantic annotation method and related device - Google Patents

Tax knowledge data semantic annotation method and related device Download PDF

Info

Publication number
CN111488741A
CN111488741A CN202010291485.XA CN202010291485A CN111488741A CN 111488741 A CN111488741 A CN 111488741A CN 202010291485 A CN202010291485 A CN 202010291485A CN 111488741 A CN111488741 A CN 111488741A
Authority
CN
China
Prior art keywords
tax
semantic
labeling
result
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010291485.XA
Other languages
Chinese (zh)
Inventor
史源源
刘勇
黄志苹
王培勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202010291485.XA priority Critical patent/CN111488741A/en
Publication of CN111488741A publication Critical patent/CN111488741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a tax knowledge data semantic annotation method, which comprises the following steps: acquiring tax regulation file data through a preset path; performing data extraction processing on the tax regulation file data according to a file structure to obtain tax regulation information; and semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result. The tax regulation file data is semantically labeled through the preset relation triple structure, so that the dependence on expert opinions is avoided, and the accuracy and the normalization of a labeling result are improved. The application also discloses a tax knowledge data semantic annotation device, a server and a computer readable storage medium, which have the beneficial effects.

Description

Tax knowledge data semantic annotation method and related device
Technical Field
The application relates to the technical field of computers, in particular to a tax knowledge data semantic annotation method, a tax knowledge data semantic annotation device, a server and a computer-readable storage medium.
Background
The knowledge graph is a large-scale semantic network, and the semantic network comprises entities, concepts and various semantic relations between the entities and the concepts. Unlike a generic knowledge graph such as a certain encyclopedia, a domain knowledge graph aims at a certain domain, and the covered content is deeper. Such as tax knowledge graph, are mostly entities and concepts related to tax. We use triplets to represent the domain knowledge graph. The knowledge representation defines a basic cognitive framework of the domain, and specifies which basic concepts are in the domain and which basic semantic associations are between the concepts. For example, the relationship between the small-scale taxpayer of the value-added tax and the value-added tax can be a tax free relationship, which is one of the basic knowledge in the field of tax benefits.
In the prior art, due to the strong knowledge expertise in the field, the result of directly mining information by a computer is very inaccurate, and the realization threshold is higher. Therefore, the current common knowledge representation method still depends on a large amount of manpower, and a field expert provides a narrative table of a batch of field special nouns to mine field entities; and simultaneously, marking the relation in a part of knowledge according to the structure of the triple, and expanding the part of relation into other data which may have similar relation according to a heuristic rule. Therefore, the efficiency is low in the prior art, and errors of manual addition are easy to occur in the data labeling process.
Therefore, how to reduce the dependency on experts in the knowledge annotation process and improve the accuracy and normalization of the knowledge annotation result are important issues to be focused on by those skilled in the art.
Disclosure of Invention
The application aims to provide a tax knowledge data semantic annotation method, a tax knowledge data semantic annotation device, a server and a computer readable storage medium, wherein tax regulation file data is semantically annotated through a preset relation triple structure, so that dependence on expert opinions is avoided, and accuracy and normalization of an annotation result are improved.
In order to solve the technical problem, the application provides a tax knowledge data semantic annotation method, which comprises the following steps:
acquiring tax regulation file data through a preset path;
performing data extraction processing on the tax regulation file data according to a file structure to obtain tax regulation information;
and semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result.
Optionally, the method further includes:
semantic labeling is carried out on entity concepts and attributes in the tax rule information according to the attribute triple structure, and an attribute labeling result is obtained;
and adding the attribute labeling result to the semantic labeling result.
Optionally, the method further includes:
performing source labeling on entity concepts in the semantic labeling result according to the tax regulation file data to obtain a knowledge source labeling result;
carrying out term labeling on the entity concept in the semantic labeling result according to the tax regulation file data to obtain a term-of-validity labeling result;
and adding the knowledge source labeling result and the validity period labeling result into the semantic labeling result.
Optionally, semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result, including:
extracting keywords from the tax rule information according to a semantic database to obtain a plurality of entity concepts and a plurality of relations;
and setting corresponding relations between the entity concepts and the relations according to the relation triple structure to obtain the semantic annotation result.
The application also provides a tax knowledge data semantic annotation device, including:
the original data acquisition module is used for acquiring tax regulation file data through a preset path;
the effective information acquisition module is used for extracting and processing the data of the tax regulation file according to the file structure to obtain tax regulation information;
and the semantic annotation processing module is used for performing semantic annotation on the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic annotation result.
Optionally, the method further includes:
the attribute marking processing module is used for carrying out semantic marking on the entity concepts and the attributes in the tax rule information according to the attribute triple structure to obtain an attribute marking result;
and the attribute labeling adding module is used for adding the attribute labeling result to the semantic labeling result.
Optionally, the method further includes:
the knowledge source labeling module is used for performing source labeling on the entity concepts in the semantic labeling result according to the tax regulation file data to obtain a knowledge source labeling result;
the validity period marking module is used for marking the time period of the entity concept in the semantic marking result according to the tax regulation file data to obtain a validity period marking result;
and the annotation adding module is used for adding the knowledge source annotation result and the validity period annotation result into the semantic annotation result.
Optionally, the semantic annotation processing module includes:
the keyword extraction module is used for extracting keywords from the tax rule information according to a semantic database to obtain a plurality of entity concepts and a plurality of relations;
and the relation labeling module is used for setting corresponding relations between the entity concepts and the relations according to the relation triple structure to obtain the semantic labeling result.
The present application further provides a server, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the tax knowledge data semantic annotation method when the computer program is executed.
The present application further provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the tax knowledge data semantic annotation method as described above.
The application provides a tax knowledge data semantic annotation method, which comprises the following steps: acquiring tax regulation file data through a preset path; performing data extraction processing on the tax regulation file data according to a file structure to obtain tax regulation information; and semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result.
The method comprises the steps of acquiring tax regulation file data through a preset path, extracting effective tax regulation information from the tax regulation file data, namely acquiring the relation used for judging the entity relation in tax knowledge from non-structural data, and finally performing semantic annotation on the entity concept and the relation in the tax regulation information according to a relation triple structure to obtain a final semantic annotation result, so that the data between identification relations is directly acquired from the tax regulation file instead of acquiring the relation between the entity concepts from expert experience, and the accuracy and the normalization of the knowledge annotation result are improved.
The application also provides a tax knowledge data semantic annotation device, a server and a computer readable storage medium, which have the beneficial effects, and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a semantic annotation method for tax knowledge data according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a tax knowledge data semantic annotation device according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a tax knowledge data semantic annotation method, a tax knowledge data semantic annotation device, a server and a computer readable storage medium, wherein the tax rule file data is semantically annotated through a preset relation triple structure, so that the dependence on expert opinions is avoided, and the accuracy and the normalization of an annotation result are improved.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the prior art, due to the strong knowledge expertise in the field, the result of directly mining information by a computer is very inaccurate, and the realization threshold is higher. Therefore, the current common knowledge representation method still depends on a large amount of manpower, and a field expert provides a narrative table of a batch of field special nouns to mine field entities; and simultaneously, marking the relation in a part of knowledge according to the structure of the triple, and expanding the part of relation into other data which may have similar relation according to a heuristic rule. Therefore, the efficiency is low in the prior art, and errors of manual addition are easy to occur in the data labeling process.
Therefore, the tax knowledge data semantic annotation method provided by the application obtains tax regulation file data through a preset path, then extracts effective tax rule information from the tax regulation file data, namely obtains the relation used for judging the entity relation in tax knowledge from non-structural data, and finally semantically annotates the entity concept and the relation in the tax rule information according to the relation triple structure to obtain a final semantic annotation result, so that the data between identification relations is directly obtained from the tax regulation file instead of obtaining the relation between the entity concepts from expert experience, and the accuracy and the normalization of the knowledge annotation result are improved.
Referring to fig. 1, fig. 1 is a flowchart illustrating a tax knowledge data semantic annotation method according to an embodiment of the present application.
In this embodiment, the method may include:
s101, acquiring tax regulation file data through a preset path;
the method comprises the steps of obtaining tax regulation file data through a preset path. The preset path can be a website of a national tax administration, a website of each regional tax administration, and a tax information publishing source. The acquired bulletin files are the various tax regulations related to tax. And obtaining the content of the notice file to obtain the tax regulation file data in the step.
S102, extracting data of the tax regulation file data according to the file structure to obtain tax regulation information;
on the basis of S101, the step aims to perform data extraction processing on tax regulation file data according to a file structure to obtain tax rule information. The main function of the step is to provide useless information in the tax regulation file data and only reserve effective information which is useful for semantic annotation. Since the article structure of tax regulation document data is mostly a fixed structure, for example, the validity period and the existing documents that may be abolished of the document are generally added at the beginning and end of the document. Based on the rule, effective information in the tax regulation file data can be extracted through the regular expression to obtain tax regulation information.
And S103, semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result.
On the basis of S102, the step aims to perform semantic bidding on entity concepts and relations in tax rule information according to a relation triple structure designed in advance to obtain a semantic annotation result.
Knowledge annotation requires the ability to annotate entities \ concepts and relationships. Concepts generally refer to entities of particular significance or strong reference in the text, and generally include names of people, places, organizations, dates and times, proper nouns, and the like. The small-scale taxpayers pay the value-added tax with the monthly sales rate of less than 10 ten thousand yuan (including the cost) without paying the value-added tax. "this sentence is an example, both 'value-added tax small-scale taxpayer' and 'value-added tax' are concepts. According to the experience of tax experts, in the knowledge graph of the tax preferential field, concepts can be selected to be a taxpayer or a tax.
For two concepts of 'value-added tax small-scale taxpayer' and 'value-added tax', the 'exemption' is the relationship between the two concepts, namely a specific tax preferential policy, and a relationship triple such as concept- > relationship- > concept can be generalized and formed; the relationship in the tax preferential knowledge map is the tax preferential relationship between a taxpayer and a tax, and the tax preferential relationship is divided into two types: the specific implementation scenes and conditions of tax deduction and tax exemption are different.
Optionally, S103 may include:
step 1, extracting keywords from tax rule information according to a semantic database to obtain a plurality of entity concepts and a plurality of relations;
and 2, setting corresponding relations for the entity concepts and the relations according to the relation triple structure to obtain a semantic annotation result.
Therefore, the alternative scheme mainly explains how to obtain the semantic annotation result. Specifically, in the alternative, the keyword is mainly extracted from the tax rule information, that is, a plurality of concepts and a plurality of relationships are extracted from the tax rule information. And finally, setting corresponding relations between the related entity concepts and the relations according to a preset relation triple structure, namely obtaining a relation triple, namely the relation between the entity concepts and the relation and the entity concepts, as a final and labeled result.
Optionally, this embodiment may further include:
semantic labeling is carried out on entity concepts and attributes in the tax rule information according to the attribute triple structure, and an attribute labeling result is obtained;
and adding the attribute labeling result to the semantic labeling result.
In the same alternative, in the alternative, semantic annotation is mainly performed according to the attribute triple structure, and the relationship between the entity concept and the attribute value is obtained as the attribute annotation result. And finally adding the semantic annotation result.
Optionally, this embodiment may further include:
according to tax regulation file data, carrying out source labeling on entity concepts in the semantic labeling result to obtain a knowledge source labeling result;
carrying out term labeling on entity concepts in the semantic labeling result according to tax regulation file data to obtain a validity term labeling result;
and adding the knowledge source labeling result and the validity period labeling result into the semantic labeling result.
It can be seen that the knowledge source and the validity period of each entity concept can be marked through the alternative.
In summary, in this embodiment, the tax regulation file data is acquired through the preset path, then the effective tax regulation information is extracted from the tax regulation file data, that is, the relationship used for judging the entity relationship in the tax knowledge is acquired from the non-structural data, and finally, the entity concept and the relationship in the tax regulation information are semantically labeled according to the relationship triple structure to obtain the final semantic labeling result, so that the data between the identification relationships is directly acquired from the tax regulation file, instead of acquiring the relationship between the entity concepts from the expert experience, and the accuracy and the normalization of the knowledge labeling result are improved.
The tax knowledge data semantic annotation method provided by the present application is further described below by another specific embodiment.
In this embodiment, the method may include
Step 1: tax regulation acquisition
On the website of the national tax administration, bulletin files of various tax regulations of the ministry of finance and the tax administration are provided. Therefore, the content of the announcement webpage related to the tax benefits can be copied by the web crawler and stored as a word document.
Step 2: rule-based information extraction
This step is intended to reduce the amount of documents that need to be read manually.
The article structure of the tax legislation is mostly in a fixed format, such as the beginning and the end of the document will be generally described with the term of life and the existing documents that may be abolished, and the section describes the different tax type adapted benefits of different taxpayers. Based on the rules, the information is extracted by a regular expression, and the document is extracted into segmented text for further processing.
And step 3: marking template
For the tax preferential knowledge map, the knowledge source is the national policy and regulation, and people without experience in tax industry are difficult to directly read. Therefore, a knowledge labeling template of the tax benefits is formulated by combining the triple construction requirements of the knowledge map and the specific conditions of knowledge sources in the tax benefits field.
For convenience of understanding, the present embodiment takes the document No. 13 of finance and tax [ 2019 ], a general tax preferential policy of small micro-profit enterprises as an example.
Knowledge annotation requires the ability to annotate entities \ concepts and relationships. Concepts generally refer to entities of particular significance or strong reference in the text, and generally include names of people, places, organizations, dates and times, proper nouns, and the like. The small-scale taxpayers pay the value-added tax with the monthly sales rate of less than 10 ten thousand yuan (including the cost) without paying the value-added tax. "this sentence is an example, both 'value-added tax small-scale taxpayer' and 'value-added tax' are concepts. According to the experience of tax experts, in the knowledge graph of the tax preferential field, concepts can be selected to be a taxpayer or a tax.
For two concepts of 'value-added tax small-scale taxpayer' and 'value-added tax', the 'exemption' is the relationship between the two concepts, namely a specific tax preferential policy, and a relationship triple such as concept- > relationship- > concept can be generalized and formed; the relationship in the tax preferential knowledge map is the tax preferential relationship between a taxpayer and a tax, and the tax preferential relationship is divided into two types: the specific implementation scenes and conditions of tax deduction and tax exemption are different.
The small micro-profit corporation is a corporation engaged in non-restricted and prohibited industries of China and meeting the three conditions of no more than 300 ten thousand yuan for annual tax payment, no more than 300 persons for working, no more than 5000 ten thousand yuan for total assets, etc. In this sentence, 'means' the following description is a definition of the mini-micro profit corporation, and this is a relationship that "definition of a is B", and an attribute triple of concept- > attribute value can be generalized.
Finally, due to the particularity of the tax field, when each tax regulation is issued, the valid date of the tax regulation can be written, and some previous policy regulations can be abolished, so that the source of each concept and relationship can be traced, the knowledge source needs to be specially managed, and in the template, knowledge source labels are also provided.
In conclusion, the tax benefit knowledge map labeling template constructed by the method is divided into attribute triple labeling, relation triple labeling and knowledge source labeling.
Take the tax [ 2019 ] document No. 13 as an example.
Step 3.1: attribute triple tagging
The concept is a taxpayer or a tax type, the attributes are the attributes involved in the document, and the attribute values are the values that the attributes should have. At the same time, each concept must also mark the source knowledge source of this concept.
TABLE 1 attribute triple schematic
Figure BDA0002450564760000091
Step 3.2: relationship triple labeling
The relationship in the field of tax preference is necessarily between taxpayer concept and tax concept, and the relationship is only two: the specific scenes and conditions of the relationship are different for tax deduction and tax exemption. In addition, relationships must also be source-tagged.
Table 2 relationship triplet schematic table
Figure BDA0002450564760000092
Step 3.3: knowledge source annotation
For each tax regulation as a knowledge source, it defines either some entity \ concept or some relationship, and at the same time, it also contains the valid period and which other tax regulation files are abolished, which are marked in the template.
Table 3 schematic table of knowledge source labels
Figure BDA0002450564760000101
TABLE 4 significance period labeling
Figure BDA0002450564760000102
After the template is filled, the knowledge annotation of the tax regulations is finished, and the subsequent knowledge map construction work can be carried out.
Therefore, the embodiment can acquire the tax regulation file data through the preset path, then extract effective tax regulation information from the tax regulation file data, namely acquire the relation used for judging the entity relation in the tax knowledge from the non-structural data, and finally perform semantic annotation on the entity concept and the relation in the tax regulation information according to the relation triple structure to obtain the final semantic annotation result, so that the data between the identification relations is directly acquired from the tax regulation file instead of acquiring the relation between the entity concepts from the expert experience, and the accuracy and the normalization of the knowledge annotation result are improved.
In the following, the tax knowledge data semantic annotation device provided in the embodiment of the present application is introduced, and the tax knowledge data semantic annotation device described below and the tax knowledge data semantic annotation method described above may be referred to in a corresponding manner.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a tax knowledge data semantic annotation device according to an embodiment of the present application.
In this embodiment, the apparatus may include:
the original data acquisition module 100 is used for acquiring tax regulation file data through a preset path;
the effective information acquisition module 200 is used for extracting and processing data of the tax regulation file data according to the file structure to obtain tax regulation information;
and the semantic annotation processing module 300 is configured to perform semantic annotation on the entity concepts and the relationships in the tax rule information according to the relationship triple structure to obtain a semantic annotation result.
Optionally, the apparatus may further include:
the attribute marking processing module is used for carrying out semantic marking on the entity concepts and the attributes in the tax rule information according to the attribute triple structure to obtain an attribute marking result;
and the attribute labeling adding module is used for adding the attribute labeling result to the semantic labeling result.
Optionally, the apparatus may further include:
the knowledge source labeling module is used for carrying out source labeling on entity concepts in the semantic labeling result according to tax regulation file data to obtain a knowledge source labeling result;
the validity period marking module is used for marking the time period of the entity concept in the semantic marking result according to the tax regulation file data to obtain a validity period marking result;
and the annotation adding module is used for adding the knowledge source annotation result and the validity period annotation result to the semantic annotation result.
Optionally, the semantic annotation processing module 300 may include:
the keyword extraction module is used for extracting keywords from the tax rule information according to the semantic database to obtain a plurality of entity concepts and a plurality of relations;
and the relation labeling module is used for setting corresponding relations to the entity concepts and the relations according to the relation triple structure to obtain a semantic labeling result.
An embodiment of the present application further provides a server, including:
a memory for storing a computer program;
a processor, configured to implement the steps of the tax knowledge data semantic annotation method according to the above embodiment when the computer program is executed.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for semantic annotation of tax knowledge data according to the foregoing embodiment is implemented.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The tax knowledge data semantic annotation method, tax knowledge data semantic annotation device, server and computer readable storage medium provided by the application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (10)

1. A tax knowledge data semantic annotation method is characterized by comprising the following steps:
acquiring tax regulation file data through a preset path;
performing data extraction processing on the tax regulation file data according to a file structure to obtain tax regulation information;
and semantically labeling the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic labeling result.
2. The tax knowledge data semantic annotation method of claim 1, further comprising:
semantic labeling is carried out on entity concepts and attributes in the tax rule information according to the attribute triple structure, and an attribute labeling result is obtained;
and adding the attribute labeling result to the semantic labeling result.
3. The tax knowledge data semantic annotation method of claim 1, further comprising:
performing source labeling on entity concepts in the semantic labeling result according to the tax regulation file data to obtain a knowledge source labeling result;
carrying out term labeling on the entity concept in the semantic labeling result according to the tax regulation file data to obtain a term-of-validity labeling result;
and adding the knowledge source labeling result and the validity period labeling result into the semantic labeling result.
4. The tax knowledge data semantic annotation method of claim 1, wherein the semantic annotation of the entity concepts and relationships in the tax rule information according to a relationship triple structure to obtain a semantic annotation result comprises:
extracting keywords from the tax rule information according to a semantic database to obtain a plurality of entity concepts and a plurality of relations;
and setting corresponding relations between the entity concepts and the relations according to the relation triple structure to obtain the semantic annotation result.
5. The tax knowledge data semantic annotation device is characterized by comprising:
the original data acquisition module is used for acquiring tax regulation file data through a preset path;
the effective information acquisition module is used for extracting and processing the data of the tax regulation file according to the file structure to obtain tax regulation information;
and the semantic annotation processing module is used for performing semantic annotation on the entity concepts and the relations in the tax rule information according to the relation triple structure to obtain a semantic annotation result.
6. The tax knowledge data semantic annotation device of claim 5, further comprising:
the attribute marking processing module is used for carrying out semantic marking on the entity concepts and the attributes in the tax rule information according to the attribute triple structure to obtain an attribute marking result;
and the attribute labeling adding module is used for adding the attribute labeling result to the semantic labeling result.
7. The tax knowledge data semantic annotation device of claim 5, further comprising:
the knowledge source labeling module is used for performing source labeling on the entity concepts in the semantic labeling result according to the tax regulation file data to obtain a knowledge source labeling result;
the validity period marking module is used for marking the time period of the entity concept in the semantic marking result according to the tax regulation file data to obtain a validity period marking result;
and the annotation adding module is used for adding the knowledge source annotation result and the validity period annotation result into the semantic annotation result.
8. The tax knowledge data semantic annotation device according to claim 5, wherein the semantic annotation processing module comprises:
the keyword extraction module is used for extracting keywords from the tax rule information according to a semantic database to obtain a plurality of entity concepts and a plurality of relations;
and the relation labeling module is used for setting corresponding relations between the entity concepts and the relations according to the relation triple structure to obtain the semantic labeling result.
9. A server, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the tax knowledge data semantic annotation method as claimed in any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the tax knowledge data semantic annotation method according to any one of claims 1 to 4.
CN202010291485.XA 2020-04-14 2020-04-14 Tax knowledge data semantic annotation method and related device Pending CN111488741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010291485.XA CN111488741A (en) 2020-04-14 2020-04-14 Tax knowledge data semantic annotation method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010291485.XA CN111488741A (en) 2020-04-14 2020-04-14 Tax knowledge data semantic annotation method and related device

Publications (1)

Publication Number Publication Date
CN111488741A true CN111488741A (en) 2020-08-04

Family

ID=71795003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010291485.XA Pending CN111488741A (en) 2020-04-14 2020-04-14 Tax knowledge data semantic annotation method and related device

Country Status (1)

Country Link
CN (1) CN111488741A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642976A (en) * 2021-05-14 2021-11-12 深圳航天科创实业有限公司 An information system based on the acquisition of policies and regulations by enterprises
CN114706993A (en) * 2022-02-23 2022-07-05 税友信息技术有限公司 A text entity linking method, system, electronic device and storage medium
CN115063216A (en) * 2022-06-23 2022-09-16 平安银行股份有限公司 Intelligent tax declaring method based on rule engine, computer equipment and storage medium

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639840A (en) * 2008-07-29 2010-02-03 华天清 Method and device for identifying semantic structure of network information
CN102023986A (en) * 2009-09-22 2011-04-20 日电(中国)有限公司 Method and equipment for constructing text classifier by referencing external knowledge
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN104133916A (en) * 2014-08-14 2014-11-05 百度在线网络技术(北京)有限公司 Search result information organizational method and device
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN105095195A (en) * 2015-07-03 2015-11-25 北京京东尚科信息技术有限公司 Method and system for human-machine questioning and answering based on knowledge graph
CN105677913A (en) * 2016-02-29 2016-06-15 哈尔滨工业大学 Machine translation-based construction method for Chinese semantic knowledge base
CN105956052A (en) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 Building method of knowledge map based on vertical field
CN106294744A (en) * 2016-08-11 2017-01-04 上海动云信息科技有限公司 Interest recognition methods and system
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 A system and method for building knowledge graphs for intelligence analysis
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates
CN107609052A (en) * 2017-08-23 2018-01-19 中国科学院软件研究所 A kind of generation method and device of the domain knowledge collection of illustrative plates based on semantic triangle
CN107679110A (en) * 2017-09-15 2018-02-09 广州唯品会研究院有限公司 The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN107748757A (en) * 2017-09-21 2018-03-02 北京航空航天大学 A kind of answering method of knowledge based collection of illustrative plates
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108073711A (en) * 2017-12-21 2018-05-25 北京大学深圳研究生院 A kind of Relation extraction method and system of knowledge based collection of illustrative plates
CN108491472A (en) * 2018-03-07 2018-09-04 新博卓畅技术(北京)有限公司 A kind of method and system segmenting structure medical characteristics library based on CRF++
CN109145123A (en) * 2018-09-30 2019-01-04 国信优易数据有限公司 Construction method, intelligent interactive method, system and the electronic equipment of knowledge mapping model
CN109271528A (en) * 2018-09-30 2019-01-25 税友软件集团股份有限公司 A kind of result queries method, apparatus and storage medium based on tax semanteme
CN109597894A (en) * 2018-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of correlation model generation method and device, a kind of data correlation method and device
CN109684468A (en) * 2018-12-13 2019-04-26 四川大学 For the document screening mark platform of evidence-based medicine EBM
CN109684483A (en) * 2018-12-11 2019-04-26 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of knowledge mapping
CN110175240A (en) * 2019-05-16 2019-08-27 五竹科技(天津)有限公司 Construction method, device and the storage medium of knowledge mapping relevant to outgoing call process
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110209839A (en) * 2019-06-18 2019-09-06 卓尔智联(武汉)研究院有限公司 Agricultural knowledge map construction device, method and computer readable storage medium
CN110222201A (en) * 2019-06-26 2019-09-10 中国医学科学院医学信息研究所 A kind of disease that calls for specialized treatment knowledge mapping construction method and device
CN110334212A (en) * 2019-07-01 2019-10-15 南京审计大学 A kind of territoriality audit knowledge mapping construction method based on machine learning
CN110390022A (en) * 2019-06-21 2019-10-29 厦门美域中央信息科技有限公司 A kind of professional knowledge map construction method of automation

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639840A (en) * 2008-07-29 2010-02-03 华天清 Method and device for identifying semantic structure of network information
CN102023986A (en) * 2009-09-22 2011-04-20 日电(中国)有限公司 Method and equipment for constructing text classifier by referencing external knowledge
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN104133916A (en) * 2014-08-14 2014-11-05 百度在线网络技术(北京)有限公司 Search result information organizational method and device
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN105095195A (en) * 2015-07-03 2015-11-25 北京京东尚科信息技术有限公司 Method and system for human-machine questioning and answering based on knowledge graph
CN105677913A (en) * 2016-02-29 2016-06-15 哈尔滨工业大学 Machine translation-based construction method for Chinese semantic knowledge base
CN105956052A (en) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 Building method of knowledge map based on vertical field
CN106294744A (en) * 2016-08-11 2017-01-04 上海动云信息科技有限公司 Interest recognition methods and system
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 A system and method for building knowledge graphs for intelligence analysis
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates
CN107609052A (en) * 2017-08-23 2018-01-19 中国科学院软件研究所 A kind of generation method and device of the domain knowledge collection of illustrative plates based on semantic triangle
CN107679110A (en) * 2017-09-15 2018-02-09 广州唯品会研究院有限公司 The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN107748757A (en) * 2017-09-21 2018-03-02 北京航空航天大学 A kind of answering method of knowledge based collection of illustrative plates
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108073711A (en) * 2017-12-21 2018-05-25 北京大学深圳研究生院 A kind of Relation extraction method and system of knowledge based collection of illustrative plates
CN108491472A (en) * 2018-03-07 2018-09-04 新博卓畅技术(北京)有限公司 A kind of method and system segmenting structure medical characteristics library based on CRF++
CN109597894A (en) * 2018-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of correlation model generation method and device, a kind of data correlation method and device
CN109271528A (en) * 2018-09-30 2019-01-25 税友软件集团股份有限公司 A kind of result queries method, apparatus and storage medium based on tax semanteme
CN109145123A (en) * 2018-09-30 2019-01-04 国信优易数据有限公司 Construction method, intelligent interactive method, system and the electronic equipment of knowledge mapping model
CN109684483A (en) * 2018-12-11 2019-04-26 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of knowledge mapping
CN109684468A (en) * 2018-12-13 2019-04-26 四川大学 For the document screening mark platform of evidence-based medicine EBM
CN110175240A (en) * 2019-05-16 2019-08-27 五竹科技(天津)有限公司 Construction method, device and the storage medium of knowledge mapping relevant to outgoing call process
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110209839A (en) * 2019-06-18 2019-09-06 卓尔智联(武汉)研究院有限公司 Agricultural knowledge map construction device, method and computer readable storage medium
CN110390022A (en) * 2019-06-21 2019-10-29 厦门美域中央信息科技有限公司 A kind of professional knowledge map construction method of automation
CN110222201A (en) * 2019-06-26 2019-09-10 中国医学科学院医学信息研究所 A kind of disease that calls for specialized treatment knowledge mapping construction method and device
CN110334212A (en) * 2019-07-01 2019-10-15 南京审计大学 A kind of territoriality audit knowledge mapping construction method based on machine learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642976A (en) * 2021-05-14 2021-11-12 深圳航天科创实业有限公司 An information system based on the acquisition of policies and regulations by enterprises
CN114706993A (en) * 2022-02-23 2022-07-05 税友信息技术有限公司 A text entity linking method, system, electronic device and storage medium
CN115063216A (en) * 2022-06-23 2022-09-16 平安银行股份有限公司 Intelligent tax declaring method based on rule engine, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US12158893B2 (en) Systems and method for generating a structured report from unstructured data
US11188537B2 (en) Data processing
CN113807098A (en) Model training method and device, electronic equipment and storage medium
CN112035653A (en) A method and device for extracting key policy information, storage medium, and electronic device
CN112231494B (en) Information extraction method and device, electronic equipment and storage medium
CN111488741A (en) Tax knowledge data semantic annotation method and related device
JP7208872B2 (en) Systems and methods for generating proposals based on request for proposals (RFPs)
US20240330335A1 (en) Systems, methods, and computer program products for suggesting revisions to an electronic document using large language models
CN114036266A (en) Intelligent strategy volume-combining method, device and equipment based on natural language processing
Rezaee et al. XBRL: Standardized electronic financial reporting
JP6155409B1 (en) Financial analysis system and financial analysis program
US20220237372A1 (en) Content frames for productivity applications
CN110110044B (en) Method for enterprise information combination screening
Antos et al. Practical guide to artificial intelligence and contract review
US10755047B2 (en) Automatic application of reviewer feedback in data files
CN111428497A (en) A method, device and equipment for automatically extracting investment information
CN110909532A (en) User name matching method and device, computer equipment and storage medium
Goossens et al. Automatically extracting insurance contract knowledge using nlp
CN110457659B (en) Clause document generation method and terminal equipment
Xiang et al. A Rule-Based Unstructured Information Extraction Model for Announcements of Listed Companies' Stock Increase or Decrease
US20250298966A1 (en) Ai-based method and system for drafting patent applications
US12430517B2 (en) Method and system for document structure based unsupervised long-form technical question generation
CN114692628B (en) Sample generation methods, model training methods, text extraction methods and devices
US20240202435A1 (en) Automatic cross document consolidation and visualization of data tables
Varadarajan et al. Text-mining: Application development challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200804