[go: up one dir, main page]

CN119622930A - A retrieval method, system and electronic device for tax knowledge question and answer system - Google Patents

A retrieval method, system and electronic device for tax knowledge question and answer system Download PDF

Info

Publication number
CN119622930A
CN119622930A CN202411785195.5A CN202411785195A CN119622930A CN 119622930 A CN119622930 A CN 119622930A CN 202411785195 A CN202411785195 A CN 202411785195A CN 119622930 A CN119622930 A CN 119622930A
Authority
CN
China
Prior art keywords
tax
sequence
knowledge
word segmentation
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411785195.5A
Other languages
Chinese (zh)
Inventor
刘严文
卢中青
梁国松
梁棣昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Raymon Technology Co ltd
Original Assignee
Guangdong Raymon Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Raymon Technology Co ltd filed Critical Guangdong Raymon Technology Co ltd
Priority to CN202411785195.5A priority Critical patent/CN119622930A/en
Publication of CN119622930A publication Critical patent/CN119622930A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Technology Law (AREA)
  • Mathematical Physics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Automation & Control Theory (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及检索的技术领域,尤其涉及一种用于税务知识问答系统的检索方法及系统及电子设备。本申请首先对查询语句进行分词和词性标注形成基础语言单元,然后利用预先构建的税务专业词典识别标准术语,并基于预设的税务要素识别规则提取关键信息,最后构建融入专业知识的查询向量执行检索,能够准确识别查询语句中的专业术语和关键要素,提升了税务问答的智能化水平和检索质量。

The present application relates to the technical field of retrieval, and in particular to a retrieval method, system and electronic device for a tax knowledge question and answer system. The present application first performs word segmentation and part-of-speech tagging on a query statement to form a basic language unit, then uses a pre-built tax professional dictionary to identify standard terms, and extracts key information based on preset tax element identification rules, and finally constructs a query vector incorporating professional knowledge to perform retrieval, which can accurately identify professional terms and key elements in the query statement, thereby improving the intelligence level and retrieval quality of tax question and answer.

Description

Retrieval method and system for tax knowledge question-answering system and electronic equipment
Technical Field
The application relates to the technical field of retrieval, in particular to a retrieval method and system for a tax knowledge question-answering system and electronic equipment.
Background
As tax policies grow complex, tax payer demands on tax consultation continue to grow. In order to improve tax service efficiency and quality, tax departments in various places actively explore intelligent question-answering technology application, and timely and accurate consultation services are provided for tax payers by automatically searching and matching related policy files.
The existing tax knowledge retrieval system mainly adopts a vector space model to convert user inquiry and knowledge base documents into feature vectors, and the related tax policy documents are retrieved by calculating vector cosine similarity for matching.
However, the existing retrieval system does not consider the professional characteristics of the tax field, and the system cannot accurately identify the professional terms and key elements in the query statement, so that a larger deviation exists between a retrieval result and the actual consultation intention of the user, the question and answer quality is affected, and the situation needs to be further improved.
Disclosure of Invention
In order to solve the problem that the existing retrieval system cannot accurately identify the professional terms and key elements in the query sentences and influence the question-answering quality, the application provides a retrieval method, a retrieval system and electronic equipment for a tax knowledge question-answering system, which adopts the following technical scheme:
in a first aspect, the present application provides a retrieval method for a tax knowledge question-answering system, including the steps of:
Receiving tax question query sentences input by a user;
Performing word segmentation and part-of-speech tagging on the query sentence to obtain a word segmentation sequence with part-of-speech tagging;
Based on the word segmentation sequence, identifying and marking tax professional terms in the tax professional dictionary by utilizing a pre-constructed tax professional dictionary, and generating a target sequence with tax term marks;
based on the target sequence, extracting tax elements by utilizing a preset tax element identification rule;
and constructing a query vector according to the extracted tax elements, and executing search matching in a tax knowledge base.
By adopting the technical scheme, the prior art cannot identify some complete tax main concepts, and also cannot extract key tax elements, so that the search result deviates from the actual consultation intention of the user; according to the application, firstly, the query sentence is subjected to word segmentation and part-of-speech tagging to form a basic language unit, then the standard term is identified by utilizing the pre-built tax professional dictionary, the key information is extracted based on the pre-set tax element identification rule, and finally, the query vector integrated with professional knowledge is built to execute retrieval, so that the professional term and the key element in the query sentence can be accurately identified, and the intelligent level and the retrieval quality of tax questions and answers are improved.
Optionally, the query sentence is segmented and part of speech tagged to obtain a segmented sequence with part of speech tag, which specifically comprises the following steps:
Performing preliminary word segmentation on the query sentence by using a word segmentation device to obtain an initial word segmentation sequence;
matching the initial word segmentation sequence with a preset tax abbreviation comparison table, and identifying tax professional abbreviations;
According to the tax abbreviation comparison table, the identified tax professional abbreviation is replaced by a corresponding standard term, and a standardized word segmentation sequence is obtained;
and performing part-of-speech tagging on the standardized word segmentation sequence to obtain a word segmentation sequence with part-of-speech tagging.
By adopting the technical scheme, in order to solve the problem of unsatisfactory retrieval effect caused by using a large amount of nonstandard abbreviations in tax query sentences, the method comprises the steps of firstly using a word segmentation device to obtain an initial sequence, then respectively identifying and replacing the initial sequence with standard terms through a tax abbreviation comparison table to obtain a standardized sequence, and finally marking the parts of speech to obtain a marked sequence, so that the query sentences containing the abbreviations can be more accurately understood, and the user experience is improved.
Optionally, based on the word segmentation sequence, identifying and marking tax professional terms in the word segmentation sequence by using a pre-constructed tax professional dictionary, and generating a target sequence with tax term marks, which specifically comprises the following steps:
matching the continuous phrase in the word segmentation sequence with a tax professional dictionary, and identifying standard tax professional terms to obtain a preliminary marking sequence;
calculating the character similarity of each term in the tax professional dictionary for the unrecognized phrase in the preliminary marking sequence to obtain a similarity result set;
screening phrase with similarity higher than a preset threshold value and corresponding standard tax professional terms from the similarity result set to obtain a fuzzy matching sequence;
And combining the preliminary marking sequence and the fuzzy matching sequence to generate a target sequence with tax term marks.
According to the technical scheme, in order to solve the problem that the recognition of the professional terms is incomplete due to the fact that non-standard expressions are used by a user in tax inquiry, the word segmentation sequence is firstly matched with a tax professional dictionary to recognize part of standard terms, the editing distance or the character overlapping degree of the non-matched word groups and terms in the dictionary is calculated, when the similarity exceeds a preset threshold value, the word groups are marked as corresponding standard terms, and finally, the primary marking and the fuzzy matching results are combined to generate a complete term marking sequence, various expression variants can be covered more comprehensively, and the accuracy rate and recall rate of term recognition are improved.
Optionally, based on the target sequence, extracting tax elements by using a preset tax element recognition rule, and specifically includes the following steps:
Matching the target sequence with a preset associated word list, and identifying associated words and positions in the sequence to obtain an associated word position sequence;
Dividing the target sequence into a plurality of subsequences according to the related word position sequence to obtain a subsequence set;
respectively extracting tax elements from the subsequence set by using a preset tax element identification rule to obtain a preliminary element set;
Marking the logic relation among the elements in the preliminary element set according to the semantic type of the associated word in the associated word position sequence, and generating a complete element set containing the elements and the associated relation.
By adopting the technical scheme, the method and the device for inquiring the tax type of the complex query have the advantages that firstly, the positions of the associated words are identified, the query sequence is divided into a plurality of subsequences, then, basic elements such as tax types, tax rates and policy types are extracted from the subsequences, and finally, the parallel relationship between tax type elements and the subordinate relationship between tax type elements are marked according to the semantic features of the associated words, so that the accuracy of the complex query is improved.
Optionally, after the search matching is performed in the tax knowledge base, the method further comprises the following steps:
combining the candidate knowledge list obtained by retrieval with the original query statement of the user to construct an input prompt;
The input prompt is transmitted into a large language model, and a correlation analysis result of each candidate knowledge is obtained;
extracting a relevance score of each candidate knowledge based on the relevance analysis result;
Re-ordering the candidate knowledge according to the relevance score to generate a result list;
Judging whether the highest score in the result list exceeds a preset threshold value, and if so, returning the corresponding knowledge as a reply.
By adopting the technical scheme, the problem that the search result is not accurate enough because the query semantics cannot be accurately understood only by relying on vector similarity in the traditional tax knowledge search system is solved; the method comprises the steps of firstly combining the user query and the candidate knowledge into a prompt, inputting the prompt into a large language model to obtain detailed relevance analysis, then extracting relevance scores from analysis results, finally reordering the scores based on the scores, and taking the knowledge with the score exceeding a threshold value as a final answer, thereby improving the understanding of the search method on the query intention and greatly improving the accuracy and the user experience of the search results.
Optionally, when the highest score in the result list does not exceed a preset threshold, the method further includes the following steps:
extracting knowledge items with the relevance scores not lower than a preset recommendation threshold value from the result list, and generating a candidate recommendation set;
sorting in a descending order according to the relevance score of each knowledge item in the candidate recommendation set, and generating an ordered recommendation list;
The ordered recommendation list is formatted as a user-readable recommendation interface and returned.
By adopting the technical scheme, in order to solve the problem that the user experience is poor when the tax knowledge question-answering system cannot find the completely matched answer, the method extracts the knowledge items with the relevance scores not lower than the preset recommendation threshold value, generates the candidate recommendation set, performs descending order sorting according to the relevance scores of each knowledge item in the candidate recommendation set, generates an ordered recommendation list, formats the ordered recommendation list into a recommendation interface readable by the user and returns the recommendation interface, and improves the user experience when the retrieval result is not ideal through a multilevel threshold value and a recommendation display mechanism, thereby improving the practicability of the system.
Optionally, after returning the corresponding knowledge as a reply, the method further comprises the following steps:
generating an interaction record containing user inquiry, returned knowledge ID and returned time, and distributing a unique identifier;
Receiving and recording feedback information of a user based on the identifier corresponding to the interaction record, and generating a feedback record;
according to the feedback type in the feedback record, calculating a weight adjustment value of the knowledge item;
and adding the weight adjustment value to the original weight value of the corresponding entry in the knowledge base, and updating the original weight value.
By adopting the technical scheme, the method and the system firstly generate the unique interaction ID for each query response, record the query content of the user, the returned knowledge number and time information, collect the feedback of the user on the response and store the feedback in association with the interaction ID, calculate the weight adjustment value according to different types of feedback, act the adjustment value on the original weight of the knowledge item, dynamically update the retrieval ordering basis, thereby being capable of continuously learning the user feedback to optimize the ordering of the retrieval result, enabling the system to gradually adapt to the characteristics of user demands and improving the user experience.
In a second aspect, the present application provides a retrieval system for a tax knowledge question-answering system, comprising:
The receiving module is used for receiving tax problem query sentences input by a user;
The word segmentation marking module is used for carrying out word segmentation and part-of-speech marking on the query sentence to obtain a word segmentation sequence with part-of-speech marking;
the term marking module is used for identifying and marking tax professional terms in the tax professional dictionary constructed in advance based on the word segmentation sequence to generate a target sequence with tax term marks;
the element extraction module is used for extracting tax elements by utilizing a preset tax element identification rule based on the target sequence;
and the retrieval matching module is used for constructing a query vector according to the extracted tax factors and executing retrieval matching in the tax knowledge base.
In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above-described search method for tax knowledge question-answering systems when the computer program is executed.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described retrieval method for tax knowledge question-answering systems.
In summary, the present application includes at least one of the following beneficial technical effects:
1. Firstly, dividing words and marking parts of speech of a query sentence to form a basic language unit, then, utilizing a pre-constructed tax professional dictionary to identify standard terms, extracting key information based on a pre-set tax element identification rule, and finally, constructing a query vector integrated with professional knowledge to execute retrieval, so that the professional terms and key elements in the query sentence can be accurately identified, and the intelligent level and the retrieval quality of tax questions and answers are improved;
2. in order to solve the problem of unsatisfactory retrieval effect caused by the fact that a large number of nonstandard abbreviations are used in tax query sentences, firstly, a word segmentation device is used for obtaining an initial sequence, then standard terms are respectively identified and replaced by a tax abbreviation comparison table, a standardized sequence is obtained, and finally part-of-speech labeling is carried out to obtain a labeling sequence, so that query sentences containing the abbreviations can be more accurately understood, and user experience is improved;
3. In order to solve the problem that the search result is not accurate enough due to the fact that a traditional tax knowledge search system only depends on vector similarity and can not accurately understand query semantics, the method comprises the steps of firstly combining user query and candidate knowledge into a prompt, inputting the prompt into a large language model to obtain detailed correlation analysis, then extracting correlation scores from the analysis results, finally reordering based on the scores, taking the knowledge with the score exceeding a threshold value as a final answer, improving the understanding of the search method on the query intention, and greatly improving the accuracy and user experience of the search result.
Drawings
FIG. 1 is a flow chart of a retrieval method for a tax knowledge question-answering system in accordance with an embodiment of the present application;
FIG. 2 is a flowchart of step S120 in a search method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 3 is a flowchart of step S130 in a search method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 4 is a flowchart of step S140 in a search method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of correlation analysis in a search method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of knowledge item recommendation in a retrieval method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 7 is a flowchart of updating recommendation weights in a search method for a tax knowledge question-answering system according to an embodiment of the present application;
FIG. 8 is a block diagram of a retrieval system for a tax knowledge question-answering system, according to an embodiment of the present application;
Fig. 9 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The terminology used in the following embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that the term "and/or" as used in this disclosure is intended to encompass any or all possible combinations of one or more of the listed items.
The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.
Embodiments of the application are described in further detail below with reference to the drawings.
In a first aspect, the present application provides a retrieval method for a tax knowledge question-answering system, referring to fig. 1, comprising the steps of:
S110, receiving tax question query sentences input by a user.
In this embodiment, a tax question query sentence input by a user in a natural language form is received through a user interaction interface of the system. The query sentence can be in a complete question form, such as "what the tax rate of the income tax of the enterprise is", or in a key word form, such as "the value-added tax collection point of the small-scale tax payer".
Specifically, the system firstly preprocesses the received query sentence, including operations of removing redundant space, normalization of punctuation marks, conversion of full-angle characters into half-angle characters and the like, so as to ensure consistency of subsequent processing. For example, "is the business income tax rate.
S120, performing word segmentation and part-of-speech tagging on the query sentence to obtain a word segmentation sequence with part-of-speech tagging.
In this embodiment, a word segmentation method based on statistical probability is adopted, and a preset dictionary is combined to segment the query sentence. The system maintains a comprehensive dictionary containing universal vocabulary and professional vocabulary in advance, and is used for guiding the word segmentation process, improving the word segmentation accuracy, and particularly improving the recognition capability of the special vocabulary in the tax field.
Specifically, the system first performs preliminary word segmentation on the text by using a maximum matching algorithm, and then marks the part of speech for each word segmentation. For example, for the query sentence "what the tax rate of the business income tax is," the word segmentation and labeling result is "what the tax rate/n of the business income tax/n is/v/r," where n represents nouns, u represents a co-word, v represents a verb, and r represents a pronoun.
S130, identifying and marking tax professional terms in the tax professional dictionary constructed in advance based on the word segmentation sequence, and generating a target sequence with tax term marks.
In this embodiment, the system performs term recognition through a pre-built tax professional dictionary. The dictionary contains standard tax terms and common variant forms thereof, supports accurate matching and fuzzy matching of tax technical terms, and ensures that the technical terms in the user query can be accurately identified.
Specifically, the system traverses the continuous word groups in the word segmentation sequence and matches with the tax professional dictionary. When a match is found, a special tag is used for identification, thereby generating a target sequence with semantic tags.
S140, extracting tax elements by using a preset tax element recognition rule based on the target sequence.
In this embodiment, the system extracts key elements in the query based on predefined tax element recognition rules. These rules include recognition patterns of elements such as tax types, tax payment subjects, collection objects, tax rates, etc., and can extract the core semantic information of the query from the tagged sequence.
Specifically, the system identifies various elements in the sequence in a rule matching mode, extracts tax type elements and inquiry attribute elements, and marks the subordinate relations between the tax type elements and the inquiry attribute elements.
S150, constructing a query vector according to the extracted tax factors, and executing search matching in a tax knowledge base.
In this embodiment, the system converts the extracted tax elements into structured query vectors, and matches the query vectors in the knowledge base for retrieval. A large amount of tax policy regulations, actual operation guidelines and other contents are stored in a knowledge base of the system.
Specifically, the system assigns different weights to each dimension in the query vector, e.g., the tax type dimension is higher in weight and the attribute dimension is inferior. When the search is executed, a weighted cosine similarity calculation method is adopted to find out the knowledge item most similar to the query vector.
In one embodiment, referring to fig. 2, in step S120, the query sentence is segmented and part of speech labeled to obtain a segmented word sequence with part of speech label, which specifically includes the following steps:
s121, performing preliminary word segmentation on the query sentence by using a word segmentation device to obtain an initial word segmentation sequence.
In this embodiment, the system adopts a word segmentation method based on a combination of a dictionary and a statistical model. The word segmentation device loads a preset dictionary library, wherein the dictionary library comprises a universal vocabulary dictionary and a tax field special dictionary, and each entry comprises basic attribute information such as word frequency, part of speech and the like.
Specifically, the word segmentation device firstly uses a maximum forward matching algorithm to perform preliminary segmentation on the text, meanwhile calculates the probabilities of different segmentation schemes by combining a statistical language model, and selects a segmentation result with the highest probability. When an ambiguous segmentation is encountered, the system performs disambiguation by referring to word frequency information and a context relation, and selects an optimal segmentation path.
S122, matching the initial word segmentation sequence with a preset tax abbreviation comparison table, and identifying tax professional abbreviations.
In this embodiment, the system maintains a specific tax abbreviation comparison table in advance, where the table records the common professional abbreviations in the tax field and the corresponding standard holonomics. The abbreviation comparison table is stored in a key value pair form and comprises information such as abbreviations, full names, using frequencies and the like, and two matching modes of accurate matching and fuzzy matching are supported.
Specifically, the system matches each term in the initial word segmentation sequence with an abbreviation comparison table. For example, when a user inputs professional abbreviations including "IIT" (personal income tax), "CIT" (business income tax), "VAT" (value added tax), "RTF" (printing tax), etc., the system can accurately recognize these abbreviations. For common industry abbreviations such as individual user of the individual industry and merchant, special ticket of the special invoice of the value-added tax, common ticket of the general invoice of the value-added tax, and the like, the system can also complete accurate matching through a comparison table.
S123, replacing the identified tax professional abbreviations with corresponding standard terms according to the tax abbreviation comparison table to obtain a standardized word segmentation sequence.
In this embodiment, the system replaces the identified abbreviations with corresponding standard names according to the mapping relation in the abbreviation comparison table.
Specifically, when the system finds an abbreviation in the word segmentation sequence, the system directly calls the standard full name in the comparison table to replace.
S124, performing part-of-speech tagging on the standardized word segmentation sequence to obtain a word segmentation sequence with part-of-speech tagging.
In the embodiment, the system adopts a part-of-speech labeler based on a conditional random field, and the labeler can accurately identify the part of speech of the special vocabulary in the tax field through training of a large-scale tax text corpus. The labeling set comprises basic parts of speech such as nouns, verbs, adjectives and the like, and special labels such as tax types, tax rates, deadlines and the like.
Specifically, the labeler makes part-of-speech judgment on each word in the normalized word segmentation sequence, and determines the part-of-speech according to the context of the word and the characteristics of the word itself.
In one embodiment, referring to fig. 3, in step S130, based on the word segmentation sequence, the tax professional dictionary constructed in advance is utilized to identify and label the tax professional terms therein, and a target sequence with tax term labels is generated, which specifically includes the following steps:
s131, matching the continuous phrase in the word segmentation sequence with the tax professional dictionary, and identifying standard tax professional terms to obtain a preliminary marking sequence.
In this embodiment, the system processes word sequences using a multi-level matching strategy. The tax professional dictionary is classified and stored according to different dimensions of tax category, collection management, tax payment subject and the like, and each term contains attribute information such as standard name, category label, synonym set and the like.
Specifically, the system scans the sliding window of the phrase in the word segmentation sequence, the window size is reduced from the maximum phrase length, and the window size is sequentially matched with the professional dictionary.
S132, calculating the character similarity of the unrecognized phrase in the preliminary marking sequence and each term in the tax professional dictionary to obtain a similarity result set.
In this embodiment, the system performs fuzzy matching on the phrase which cannot be matched accurately by using a character similarity calculation method. The similarity calculation comprehensively considers a plurality of factors such as editing distance, character overlap ratio, position weight and the like, and a candidate term list is generated for each phrase to be matched.
Specifically, the system calculates the similarity between the phrases. For example, when a user enters a non-standard expression of "business tax preference," the system calculates a similarity with terms in the dictionary such as "business derived tax preference policy", "tax preference policy" and records a similarity score.
S133, screening out phrases with similarity higher than a preset threshold value and corresponding standard tax professional terms from the similarity result set to obtain a fuzzy matching sequence.
In this embodiment, the system sets a dynamic similarity threshold mechanism. The base threshold is set to 0.75 and the system dynamically adjusts the threshold based on the query context and phrase length. For shorter phrases, the system adopts a higher threshold value to ensure matching precision, and for longer phrases, the threshold value is lowered to improve recall rate.
Specifically, the system sorts and screens the similarity result set. For example, when "business tax offer" is identified, if the similarity to "business income tax offer policy" is 0.82, the threshold is exceeded, it is recorded into the fuzzy matching sequence and marked.
S134, combining the preliminary marking sequence and the fuzzy matching sequence to generate a target sequence with tax term marks.
In this embodiment, the system uses a priority policy to handle merging of tag sequences. The exact match results have the highest priority, with the fuzzy match results being inferior. When overlapping labels occur, the system selects the optimal label scheme based on the priority rules and coverage.
Specifically, the system orderly combines the marks in the preliminary mark sequence and the fuzzy matching sequence according to the positions in the original text.
In one embodiment, referring to fig. 4, in step S140, tax elements are extracted by using preset tax element recognition rules based on a target sequence, and specifically includes the following steps:
s141, matching the target sequence with a preset associated word list, and identifying associated words and positions in the sequence to obtain an associated word position sequence.
In this embodiment, the system maintains a related vocabulary including a plurality of categories including logical related words, conditional related words, time related words, and the like. Each class of related words marks the semantic function and the use scene of the related words, and the system understands the semantic structure of the query sentence through the recognition of the related words.
Specifically, the system traverses the target sequence and matches each phrase with the associated vocabulary. For example, for the query "tax rate for corporate income tax and value added tax and the tax levying point for small scale taxpayers", the system identifies the locations of the associated words "and", and marks them as parallel associated words,
S142, dividing the target sequence into a plurality of subsequences according to the related word position sequence to obtain a subsequence set.
In this embodiment, the system uses a divide-and-conquer strategy to divide the target sequence into a plurality of sub-sequences with complete semantics according to the recognized related word position.
Specifically, taking the inquiry as an example, the system firstly divides the sequence into three subsequences of 'tax rate of the enterprise income tax', 'tax rate of the value-added tax', 'tax point of small-scale tax payer' according to the 'sum', 'and' two related words. Each subsequence retains original tax term labels for subsequent element extraction.
S143, respectively extracting tax elements from the subsequence set by using a preset tax element identification rule to obtain a preliminary element set.
In this embodiment, the system uses a rule-based element recognition method, and the system defines specific recognition rules for different types of tax elements (e.g., tax types, tax rates, terms, conditions, etc.).
Specifically, for each sub-sequence, the system applies a corresponding element recognition rule for processing. For example, for the sub-sequence "tax rate of business income tax", the system identifies tax type element "business income tax" and attribute element "tax rate" by rules, and for "small-scale tax payer's point of collection", the system identifies the main tax principal element "small-scale tax payer" and attribute element "point of collection". The original expression and the standardized expression of the elements are reserved in the extraction process to form a preliminary element set.
S144, marking the logic relation among the elements in the preliminary element set according to the semantic type of the associated word in the associated word position sequence, and generating a complete element set containing the elements and the associated relation.
In this embodiment, the system constructs a logical relationship network between the elements based on the semantic types of the related words. The system defines a plurality of relationship types including juxtaposition, subordinate, conditional, turning, etc. for describing semantic links between different elements. The final generated complete element set not only contains the specific content of each element, but also contains the logic association relation among the elements, so as to form a structured semantic network.
In one embodiment, referring to fig. 5, after performing the search matching in the tax knowledge base, the steps further include:
s510, combining the candidate knowledge list obtained by retrieval with the original query statement of the user to construct an input prompt.
In this embodiment, the system adopts a templatized prompt word construction method, and a special prompt word template is designed in advance for guiding the large language model to analyze the correlation between the query and the candidate knowledge. The hint word template comprises a plurality of parts including an original query description, candidate knowledge content, a task description, scoring requirements and the like, and the information is organized in a structured manner.
Specifically, the system combines the user query and the candidate knowledge according to a preset template. For example, for the query "what the small-scale tax payer value-added tax rate is," the system may construct a hint "please analyze how relevant the following tax policy is to the user' what the small-scale tax payer value-added tax rate is: [ candidate knowledge content ]. Please evaluate from three dimensions of expertise, integrity, timeliness and give a composite score of 0-10. "
S520, the input prompt is transmitted into a large language model, and a correlation analysis result of each candidate knowledge is obtained.
In this embodiment, the system inputs the constructed prompt into the pre-optimized large language model. The model can accurately understand the professional content and nuances of tax policy texts through special training of knowledge in the tax field, so that accurate judgment can be made on the correlation of the knowledge.
S530, extracting the correlation score of each candidate knowledge based on the correlation analysis result.
In this embodiment, the system uses regular expressions and text parsing rules to extract numerical scores from the parsed text output by the model. The system can identify the scores of different dimensions and the final integrated score, and perform rationality verification to ensure that the extracted score meets the expected range.
S540, the candidate knowledge is reordered according to the relevance score, and a result list is generated.
In this embodiment, the system ranks the candidate knowledge based on the relevance score while considering other attributes of the knowledge, such as policy timeliness, file level, etc. The sorting algorithm adopts a weighted sorting mode, and the weights of different attributes can be adjusted according to actual application scenes.
Specifically, the system arranges the candidate knowledge in descending order of the composite score to generate a new result list.
S550, judging whether the highest score in the result list exceeds a preset threshold value, and if so, returning the corresponding knowledge as a reply.
In this embodiment, the system sets a dynamic threshold mechanism, and the basic threshold is 7 minutes. The system dynamically adjusts the threshold based on factors such as complexity of the query, number of candidate knowledge, etc. When the highest score exceeds a threshold, the system considers that the knowledge has sufficient relevance to return as a valid answer.
In one embodiment, referring to fig. 6, when the highest score in the result list does not exceed the preset threshold, the method further includes the following steps:
S610, extracting knowledge items with the relevance score not lower than a preset recommendation threshold value from the result list, and generating a candidate recommendation set.
In this embodiment, the system sets a recommendation threshold below the master threshold for screening potentially relevant knowledge. The system will collect all knowledge items with scores above the recommendation threshold to form a candidate recommendation pool.
Specifically, the system traverses each knowledge item in the results list, checking if its score reaches the recommendation threshold. For example, for a result list containing 10 pieces of knowledge, if 4 pieces of knowledge have scores of 6.8, 6.5, 5.8 and 5.2, respectively, and the recommendation threshold is 5, then all 4 pieces of knowledge are included in the candidate recommendation set. The system can record key attributes of each knowledge, such as policy validity period, application range and the like, for subsequent display and sorting.
S620, sorting in descending order according to the relevance score of each knowledge item in the candidate recommendation set, and generating an ordered recommendation list.
In this embodiment, the system adopts a multidimensional ranking strategy, and in addition to considering the relevance score, factors such as timeliness, frequency of use, document source and the like of the knowledge items are combined. The system assigns a weight to each dimension and obtains a final ranking score by weighted calculation.
S630, formatting the ordered recommendation list into a recommendation interface readable by a user and returning.
In one embodiment, referring to fig. 7, after returning the corresponding knowledge as a reply, the following steps are further included:
s710, generating an interaction record containing the user query, the returned knowledge ID and the returned time, and distributing a unique identifier.
In this embodiment, the system employs a distributed ID generation algorithm to generate a globally unique identifier for each query interaction. The interaction record contains user information (desensitization process), query content, ID of returned knowledge, interaction time stamp and query source, etc.
Specifically, when the system returns a tax policy knowledge, an interaction record is created immediately.
S720, receiving and recording feedback information of the user based on the identifier corresponding to the interaction record, and generating a feedback record.
In this embodiment, the feedback information of the user includes explicit feedback (such as active evaluation of the user) and implicit feedback (such as browsing duration, whether to collect, etc.). And the feedback information is associated with the interaction record through the unique identifier to form a complete user experience data chain.
S730, calculating a weight adjustment value of the knowledge item according to the feedback type in the feedback record.
In this embodiment, the system designs a corresponding weight adjustment algorithm according to different types of feedback. The algorithm considers factors such as feedback type, user activity, feedback timeliness and the like, and obtains a final adjustment value through weighted calculation.
S740, adding the weight adjustment value to the original weight value of the corresponding entry in the knowledge base, and updating the original weight value.
In this embodiment, the system adopts a progressive weight update strategy to modify the original weight of the knowledge item by accumulating the adjustment values fed back multiple times. The time decay factor is considered in the updating process, so that the recent feedback has a larger influence. The system also regularly normalizes the weight values to maintain the balance of the overall weight distribution.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In a second aspect, the present application provides a retrieval system for a tax knowledge question-answering system, and the retrieval system for a tax knowledge question-answering system of the present application will be described below with reference to the above retrieval method for a tax knowledge question-answering system.
Referring to fig. 8, a retrieval system for a tax knowledge question-and-answer system, comprising:
The receiving module is used for receiving tax problem query sentences input by a user;
The word segmentation marking module is used for performing word segmentation and part-of-speech marking on the query sentence to obtain a word segmentation sequence with part-of-speech marking;
the term marking module is used for identifying and marking tax professional terms in the tax professional dictionary constructed in advance based on the word segmentation sequence to generate a target sequence with tax term marks;
the element extraction module is used for extracting tax elements by utilizing a preset tax element identification rule based on the target sequence;
and the retrieval matching module is used for constructing a query vector according to the extracted tax factors and executing retrieval matching in the tax knowledge base.
In one embodiment, the word segmentation annotation module includes:
The preliminary word segmentation unit is used for carrying out preliminary word segmentation on the query sentence by utilizing a word segmentation device to obtain an initial word segmentation sequence;
the abbreviation identification unit is used for matching the initial word segmentation sequence with a preset tax abbreviation comparison table and identifying tax professional abbreviations;
The standardized unit is used for replacing the identified tax professional abbreviations with corresponding standard terms according to the tax abbreviation comparison table to obtain standardized word segmentation sequences;
the part-of-speech tagging unit is used for performing part-of-speech tagging on the standardized word segmentation sequence to obtain a word segmentation sequence with part-of-speech tagging.
In one embodiment, the term tagging module includes:
the standard term identification unit is used for matching the continuous phrase in the word segmentation sequence with the tax professional dictionary and identifying standard tax professional terms to obtain a preliminary marking sequence;
The similarity calculation unit is used for calculating the character similarity of each term in the tax professional dictionary for the unrecognized phrase in the preliminary marking sequence to obtain a similarity result set;
the fuzzy matching unit is used for screening out phrases with similarity higher than a preset threshold value and corresponding standard tax professional terms from the similarity result set to obtain a fuzzy matching sequence;
And the sequence merging unit is used for merging the preliminary marking sequence and the fuzzy matching sequence to generate a target sequence with tax term marks.
In one embodiment, the element extraction module includes:
The related word recognition unit is used for matching the target sequence with a preset related word list, recognizing related words and positions in the sequence and obtaining a related word position sequence;
The sequence dividing unit is used for dividing the target sequence into a plurality of subsequences according to the related word position sequence to obtain a subsequence set;
The element identification unit is used for respectively extracting tax elements from the subsequence set by utilizing a preset tax element identification rule to obtain a preliminary element set;
the relation marking unit is used for marking the logic relation among the elements in the preliminary element set according to the semantic type of the associated word in the associated word position sequence and generating a complete element set containing the elements and the associated relation.
In one embodiment, further comprising:
The prompt construction module is used for combining the candidate knowledge list obtained by retrieval with the original query statement of the user to construct an input prompt;
The correlation analysis module is used for transmitting the input prompt into the large language model to obtain a correlation analysis result of each candidate knowledge;
The score extraction module is used for extracting the relevance score of each candidate knowledge based on the relevance analysis result;
The sequencing module is used for re-sequencing the candidate knowledge according to the relevance score to generate a result list;
and the judging and returning module is used for judging whether the highest score in the result list exceeds a preset threshold value, and returning the corresponding knowledge as a reply if the highest score exceeds the preset threshold value.
In one embodiment, further comprising:
The recommendation set generation module is used for extracting knowledge items with the relevance score not lower than a preset recommendation threshold value from the result list and generating a candidate recommendation set;
the recommendation ordering module is used for ordering in a descending order according to the relevance score of each knowledge item in the candidate recommendation set, and generating an ordered recommendation list;
And the recommendation display module is used for formatting the ordered recommendation list into a recommendation interface readable by a user and returning the recommendation interface.
In one embodiment, further comprising:
the interaction record module is used for generating an interaction record comprising user inquiry, returned knowledge ID and returned time and distributing a unique identifier;
the feedback recording module is used for receiving and recording feedback information of the user based on the identifier corresponding to the interaction record and generating a feedback record;
the weight calculation module is used for calculating a weight adjustment value of the knowledge item according to the feedback type in the feedback record;
and the weight updating module is used for adding the weight adjustment value to the original weight value of the corresponding entry in the knowledge base and updating the original weight value.
In one embodiment, the present application provides an electronic device, which may be a server, and an internal structure thereof may be as shown in fig. 9. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the electronic device is for storing data. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a retrieval method for a tax knowledge question-answering system.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the electronic device to which the present application is applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided an electronic device including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method embodiments described above when executing the computer program.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The above embodiments are not intended to limit the scope of the application, so that the equivalent changes of the structure, shape and principle of the application are covered by the scope of the application.

Claims (9)

1.一种用于税务知识问答系统的检索方法,其特征在于,包括如下步骤:1. A retrieval method for a tax knowledge question-answering system, characterized in that it comprises the following steps: 接收用户输入的税务问题查询语句;Receive tax query statements input by users; 对所述查询语句进行分词和词性标注,得到带词性标注的分词序列;Performing word segmentation and part-of-speech tagging on the query sentence to obtain a word segmentation sequence with part-of-speech tags; 基于所述分词序列,利用预先构建的税务专业词典识别并标记其中的税务专业术语,生成带有税务术语标记的目标序列;Based on the word segmentation sequence, the tax professional terms therein are identified and marked using a pre-built tax professional dictionary to generate a target sequence marked with tax terminology; 基于所述目标序列,利用预设的税务要素识别规则提取税务要素;Based on the target sequence, extracting tax elements using preset tax element identification rules; 根据提取的税务要素构建查询向量,在税务知识库中执行检索匹配。A query vector is constructed based on the extracted tax elements, and retrieval matching is performed in the tax knowledge base. 2.根据权利要求1所述的用于税务知识问答系统的检索方法,其特征在于,对查询语句进行分词和词性标注,得到带词性标注的分词序列,具体包括如下步骤:2. The retrieval method for the tax knowledge question-answering system according to claim 1 is characterized in that the query sentence is segmented and POS tagged to obtain a segmented word sequence with POS tagged, which specifically comprises the following steps: 利用分词器对查询语句进行初步分词,得到初始分词序列;Use the word segmenter to perform preliminary word segmentation on the query statement to obtain an initial word segmentation sequence; 将所述初始分词序列与预设的税务缩写对照表进行匹配,识别税务专业缩写词;Matching the initial word segmentation sequence with a preset tax abbreviation comparison table to identify tax professional abbreviations; 根据所述税务缩写对照表,将识别出的税务专业缩写词替换为对应的标准术语,得到标准化分词序列;According to the tax abbreviation comparison table, the identified tax professional abbreviations are replaced with corresponding standard terms to obtain a standardized word segmentation sequence; 对所述标准化分词序列进行词性标注,得到带词性标注的分词序列。Part-of-speech tagging is performed on the standardized word segmentation sequence to obtain a word segmentation sequence with part-of-speech tagging. 3.根据权利要求1所述的用于税务知识问答系统的检索方法,其特征在于,基于所述分词序列,利用预先构建的税务专业词典识别并标记其中的税务专业术语,生成带有税务术语标记的目标序列,具体包括如下步骤:3. The retrieval method for a tax knowledge question-and-answer system according to claim 1 is characterized in that, based on the word segmentation sequence, a pre-built tax professional dictionary is used to identify and mark the tax professional terms therein, and a target sequence with tax term marks is generated, which specifically includes the following steps: 将所述分词序列中的连续词组与税务专业词典进行匹配,识别标准税务专业术语,得到初步标记序列;Matching the continuous phrases in the word segmentation sequence with the tax professional dictionary, identifying standard tax professional terms, and obtaining a preliminary tag sequence; 对所述初步标记序列中未被识别的词组,计算与税务专业词典中各术语的字符相似度,得到相似度结果集;For the unrecognized phrases in the preliminary tag sequence, calculating the character similarity with each term in the tax professional dictionary to obtain a similarity result set; 从所述相似度结果集中筛选出相似度高于预设阈值的词组及对应的标准税务专业术语,得到模糊匹配序列;Filter out phrases with similarity higher than a preset threshold and corresponding standard tax terminology from the similarity result set to obtain a fuzzy matching sequence; 将所述初步标记序列和所述模糊匹配序列进行合并,生成带有税务术语标记的目标序列。The preliminary tag sequence and the fuzzy matching sequence are merged to generate a target sequence with tax term tags. 4.根据权利要求1所述的用于税务知识问答系统的检索方法,其特征在于,基于所述目标序列,利用预设的税务要素识别规则提取税务要素,具体包括如下步骤:4. The retrieval method for a tax knowledge question-and-answer system according to claim 1 is characterized in that, based on the target sequence, the tax elements are extracted using a preset tax element identification rule, specifically comprising the following steps: 将所述目标序列与预设的关联词表进行匹配,识别序列中的关联词及位置,得到关联词位置序列;Matching the target sequence with a preset associated word table, identifying the associated words and their positions in the sequence, and obtaining an associated word position sequence; 根据所述关联词位置序列,将所述目标序列划分为多个子序列,得到子序列集合;According to the associated word position sequence, the target sequence is divided into a plurality of subsequences to obtain a subsequence set; 利用预设的税务要素识别规则,从所述子序列集合中分别提取税务要素,得到初步要素集;Using a preset tax element identification rule, respectively extract tax elements from the subsequence set to obtain a preliminary element set; 根据所述关联词位置序列中关联词的语义类型,标记所述初步要素集中要素之间的逻辑关系,生成包含要素及关联关系的完整要素集。According to the semantic types of the associated words in the associated word position sequence, the logical relationships between the elements in the preliminary element set are marked to generate a complete element set including the elements and the associated relationships. 5.根据权利要求1所述的用于税务知识问答系统的检索方法,其特征在于,在税务知识库中执行检索匹配后,还包括如下步骤:5. The retrieval method for the tax knowledge question and answer system according to claim 1 is characterized in that after performing the search and matching in the tax knowledge base, it also includes the following steps: 将检索得到的候选知识列表与用户原始查询语句组合,构建输入提示;Combine the retrieved candidate knowledge list with the user's original query sentence to construct an input prompt; 将所述输入提示传入大语言模型,获取每条候选知识的相关性分析结果;The input prompt is passed into the large language model to obtain the relevance analysis result of each candidate knowledge; 基于所述相关性分析结果,提取每条候选知识的相关性得分;Based on the correlation analysis result, extracting the correlation score of each candidate knowledge; 根据所述相关性得分对候选知识重新排序,生成结果列表;Reorder the candidate knowledge according to the relevance scores to generate a result list; 判断结果列表中最高得分是否超过预设阈值,若超过则返回对应知识作为回复。Determine whether the highest score in the result list exceeds the preset threshold. If so, return the corresponding knowledge as a response. 6.根据权利要求5所述的用于税务知识问答系统的检索方法,其特征在于,当所述结果列表中最高得分未超过预设阈值时,还包括如下步骤:6. The retrieval method for a tax knowledge question-and-answer system according to claim 5, characterized in that when the highest score in the result list does not exceed a preset threshold, it further comprises the following steps: 从所述结果列表中提取相关性得分不低于预设推荐阈值的知识条目,生成候选推荐集合;Extracting knowledge items with a relevance score not lower than a preset recommendation threshold from the result list to generate a candidate recommendation set; 根据所述候选推荐集合中的每个知识条目的相关性得分进行降序排序,生成有序推荐列表;Sorting in descending order according to the relevance score of each knowledge item in the candidate recommendation set to generate an ordered recommendation list; 将所述有序推荐列表格式化为用户可读的推荐界面并返回。The ordered recommendation list is formatted into a user-readable recommendation interface and returned. 7.根据权利要求5所述的用于税务知识问答系统的检索方法,其特征在于,返回对应知识作为回复之后,还包括如下步骤:7. The retrieval method for the tax knowledge question and answer system according to claim 5 is characterized in that after returning the corresponding knowledge as a reply, it also includes the following steps: 生成包含用户查询、返回知识ID、返回时间的交互记录,并分配唯一标识符;Generate an interaction record containing user query, returned knowledge ID, and return time, and assign a unique identifier; 基于所述交互记录对应的标识符,接收并记录用户的反馈信息,生成反馈记录;Based on the identifier corresponding to the interaction record, receiving and recording the user's feedback information, and generating a feedback record; 根据所述反馈记录中的反馈类型,计算知识条目的权重调整值;Calculating a weight adjustment value of a knowledge item according to the feedback type in the feedback record; 将所述权重调整值与知识库中对应条目的原始权重值相加,更新所述原始权重值。The weight adjustment value is added to the original weight value of the corresponding entry in the knowledge base to update the original weight value. 8.一种用于税务知识问答系统的检索系统,其特征在于,包括:8. A retrieval system for a tax knowledge question-answering system, comprising: 接收模块,用于接收用户输入的税务问题查询语句;A receiving module, used for receiving a tax query statement input by a user; 分词标注模块,用于对所述查询语句进行分词和词性标注,得到带词性标注的分词序列;A word segmentation and tagging module, used to perform word segmentation and part-of-speech tagging on the query sentence to obtain a word segmentation sequence with part-of-speech tagging; 术语标记模块,用于基于所述分词序列,利用预先构建的税务专业词典识别并标记其中的税务专业术语,生成带有税务术语标记的目标序列;A term tagging module is used to identify and tag the tax terminology therein based on the word segmentation sequence using a pre-built tax terminology dictionary to generate a target sequence with tax term tags; 要素提取模块,用于基于所述目标序列,利用预设的税务要素识别规则提取税务要素;An element extraction module, used to extract tax elements based on the target sequence using preset tax element identification rules; 检索匹配模块,用于根据提取的税务要素构建查询向量,在税务知识库中执行检索匹配。The retrieval and matching module is used to construct a query vector based on the extracted tax elements and perform retrieval and matching in the tax knowledge base. 9.一种电子设备,其特征在于,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现权利要求1-7中任一项所述的用于税务知识问答系统的检索方法的步骤.9. An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the retrieval method for a tax knowledge question-and-answer system described in any one of claims 1 to 7 are implemented. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-7中任一项所述的用于税务知识问答系统的检索方法的步骤。A computer-readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed by a processor, the steps of the retrieval method for a tax knowledge question and answer system described in any one of claims 1 to 7 are implemented.
CN202411785195.5A 2024-12-06 2024-12-06 A retrieval method, system and electronic device for tax knowledge question and answer system Pending CN119622930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411785195.5A CN119622930A (en) 2024-12-06 2024-12-06 A retrieval method, system and electronic device for tax knowledge question and answer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411785195.5A CN119622930A (en) 2024-12-06 2024-12-06 A retrieval method, system and electronic device for tax knowledge question and answer system

Publications (1)

Publication Number Publication Date
CN119622930A true CN119622930A (en) 2025-03-14

Family

ID=94904810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411785195.5A Pending CN119622930A (en) 2024-12-06 2024-12-06 A retrieval method, system and electronic device for tax knowledge question and answer system

Country Status (1)

Country Link
CN (1) CN119622930A (en)

Similar Documents

Publication Publication Date Title
CN110765244B (en) Method, device, computer equipment and storage medium for obtaining answering operation
US11210468B2 (en) System and method for comparing plurality of documents
US10489439B2 (en) System and method for entity extraction from semi-structured text documents
US20150112664A1 (en) System and method for generating a tractable semantic network for a concept
US8185536B2 (en) Rank-order service providers based on desired service properties
CN114254653A (en) Scientific and technological project text semantic extraction and representation analysis method
US11023503B2 (en) Suggesting text in an electronic document
CN108170715B (en) Text structuralization processing method
US11899727B2 (en) Document digitization, transformation and validation
US12333236B2 (en) System and method for automatically tagging documents
US11868313B1 (en) Apparatus and method for generating an article
US11922515B1 (en) Methods and apparatuses for AI digital assistants
CN118673041A (en) Method and device for searching power business database table
CN118277509A (en) Knowledge graph-based data set retrieval method
CN118070784A (en) Method, device, equipment and storage medium for constructing entity dictionary in vertical industry field
CN117993876B (en) Resume evaluation system, method, device and medium
Bouhoun et al. Information retrieval using domain adapted language models: application to resume documents for HR recruitment assistance
US20240281747A1 (en) Apparatus and method for generating system improvement data
CN117493962A (en) Method and device for classifying bulk commodity events by fusing event attributes
CN119622930A (en) A retrieval method, system and electronic device for tax knowledge question and answer system
Drury A Text Mining System for Evaluating the Stock Market's Response To News
Xian et al. DLEE: a dataset for Chinese document-level legal event extraction
US12293149B2 (en) Apparatus and method for generating an article
CN118885646B (en) Database retrieval method, device, electronic equipment and medium
Kamath et al. Semantic similarity based context-aware web service discovery using nlp techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination