CN118798201A

CN118798201A - A new question correction method and system based on large model

Info

Publication number: CN118798201A
Application number: CN202411272928.5A
Authority: CN
Inventors: 马赫; 倪小明; 郭南明; 杜育林; 洪潜凯; 刘佳荣; 李剑; 崔浩松; 蒋亦萱
Original assignee: Wangcai Technology Guangzhou Group Co ltd
Current assignee: Wangcai Technology Guangzhou Group Co ltd
Priority date: 2024-09-12
Filing date: 2024-09-12
Publication date: 2024-10-18
Anticipated expiration: 2044-09-12
Also published as: CN118798201B

Abstract

The present invention discloses a method and system for correcting errors in new propositions based on a large model, the method comprising S1: obtaining a generated new proposition and its corresponding context information; S2: performing semantic analysis on the new proposition based on a pre-trained language large model to identify potential errors therein; S3: verifying the accuracy of the new proposition using a domain-specific knowledge base; S4: automatically correcting errors in the new proposition based on the analysis results and verification feedback; the method and system for correcting errors in new propositions based on a large model solve the problem of how to develop a more intelligent and flexible error correction method to improve the accuracy and efficiency in processing new propositions. Although some studies in the prior art have attempted to improve the error correction effect by enhancing the adaptability of the model or combining a variety of technical means, there are still problems such as insufficient recognition accuracy and limited ability to handle new types of errors.

Description

A new question correction method and system based on large model

技术领域Technical Field

本发明涉及人工智能技术领域，具体涉及一种基于大模型新命题纠错方法及系统。The present invention relates to the field of artificial intelligence technology, and in particular to a new proposition error correction method and system based on a large model.

背景技术Background Art

在现有技术中，基于自然语言处理（NLP）和机器学习算法的自动纠错技术已经得到了广泛的应用，这些技术通常通过训练模型来识别和修正文本中的语法、拼写和逻辑错误，尤其在语法纠错和拼写检查方面取得了显著进展。现有的纠错系统往往依赖于预先定义的规则或词典，结合统计模型或深度学习模型来检测和修正错误，这些技术在处理标准化文本时表现出较高的准确性和效率，随着自然语言处理技术的发展，特别是大语言模型（LLM）的应用，出现了更加复杂和创新性的文本生成任务。大语言模型能够生成包含新命题或非标准表达的文本，这为现有的纠错方法带来了新的挑战。In the existing technology, automatic error correction technologies based on natural language processing (NLP) and machine learning algorithms have been widely used. These technologies usually identify and correct grammatical, spelling and logical errors in texts by training models, especially in grammar correction and spelling checking. Existing error correction systems often rely on pre-defined rules or dictionaries, combined with statistical models or deep learning models to detect and correct errors. These technologies show high accuracy and efficiency when processing standardized texts. With the development of natural language processing technology, especially the application of large language models (LLMs), more complex and innovative text generation tasks have emerged. Large language models can generate texts containing new propositions or non-standard expressions, which brings new challenges to existing error correction methods.

现有的纠错技术通常依赖于预定义的规则或标准的语言结构，但在面对大模型生成的创新性表述时，这些技术往往难以适应，无法有效识别和修正其中的错误。尤其是在处理新命题时，由于这些表述可能不符合已有的语言规范或包含创新性的逻辑推导，现有的纠错技术往往会出现较高的误判率。此外，新命题的独特性和多样性使得现有技术难以覆盖所有可能的错误类型，这进一步增加了纠错的难度。因此，如何开发更加智能和灵活的纠错方法，以提高在处理新命题时的准确性和效率，成为当前研究中的一个关键问题，目前，虽然已经有一些研究尝试通过增强模型的适应性或结合多种技术手段来提升纠错效果，但仍然存在识别准确性不足、对新类型错误处理能力有限等问题。Existing error correction techniques usually rely on predefined rules or standard language structures, but when faced with innovative expressions generated by large models, these techniques are often difficult to adapt and cannot effectively identify and correct errors. Especially when dealing with new propositions, because these expressions may not conform to existing language specifications or contain innovative logical deductions, existing error correction techniques often have a high misjudgment rate. In addition, the uniqueness and diversity of new propositions make it difficult for existing technologies to cover all possible error types, which further increases the difficulty of error correction. Therefore, how to develop more intelligent and flexible error correction methods to improve accuracy and efficiency when dealing with new propositions has become a key issue in current research. At present, although some studies have tried to improve the error correction effect by enhancing the adaptability of the model or combining multiple technical means, there are still problems such as insufficient recognition accuracy and limited ability to handle new types of errors.

发明内容Summary of the invention

本发明的目的在于提供一种基于大模型新命题纠错方法及系统，解决在如何开发更加智能和灵活的纠错方法，以提高在处理新命题时的准确性和效率问题上，现有技术虽然已经有一些研究尝试通过增强模型的适应性或结合多种技术手段来提升纠错效果，但仍然存在识别准确性不足、对新类型错误处理能力有限等问题。The purpose of the present invention is to provide a new proposition error correction method and system based on a large model, so as to solve the problem of how to develop a more intelligent and flexible error correction method to improve the accuracy and efficiency when processing new propositions. Although there have been some studies in the prior art that have attempted to improve the error correction effect by enhancing the adaptability of the model or combining a variety of technical means, there are still problems such as insufficient recognition accuracy and limited ability to handle new types of errors.

为实现上述目的，本发明提供如下技术方案：一种基于大模型新命题纠错方法，所述方法包括：To achieve the above object, the present invention provides the following technical solution: a method for correcting errors in new propositions based on a large model, the method comprising:

S1：获取生成的新命题及其对应的语境信息；S1: Obtain the generated new proposition and its corresponding context information;

S2：基于预训练的语言大模型对新命题进行语义分析,识别其中的潜在错误；S2: Perform semantic analysis on new propositions based on the pre-trained language model to identify potential errors;

S3：利用领域特定知识库验证所述新命题的准确性；S3: Use domain-specific knowledge base to verify the accuracy of the new proposition;

S4：根据分析结果和验证反馈自动修正新命题中的错误。S4: Automatically correct errors in new propositions based on analysis results and verification feedback.

优选的，所述步骤S1获取生成的新命题及其对应的语境信息具体包括：Preferably, the step S1 of obtaining the generated new proposition and its corresponding context information specifically includes:

通过API接口接收输出的新命题文本；Receive the output new proposition text through the API interface;

提取新命题文本中的关键词汇；Extract key words from new proposition text;

基于关键词汇构建语境上下文环境；Build context based on key words;

利用自然语言处理技术识别新命题中的实体和关系。Leverage natural language processing techniques to identify entities and relations in new propositions.

优选的，所述步骤S2中基于预训练的语言大模型对所述新命题进行语义分析这一步骤具体包括：Preferably, the step of performing semantic analysis on the new proposition based on the pre-trained language model in step S2 specifically includes:

将新命题输入至预训练语言大模型以获得语义向量表示；Input the new proposition into the pre-trained language model to obtain a semantic vector representation;

基于语义向量表示计算新命题与领域内标准表述之间的相似度S；Calculate the similarity S between the new proposition and the standard expression in the field based on the semantic vector representation;

如果相似度S低于预设阈值T，则判定新命题可能存在语义错误；If the similarity S is lower than the preset threshold T, it is determined that the new proposition may have semantic errors;

对比新命题与领域内标准表述的关键差异点。Compare the key differences between the new proposition and the standard statements in the field.

优选的，所述基于语义向量表示计算新命题与领域内标准表述之间的相似度S具体包括：Preferably, the calculating the similarity S between the new proposition and the standard expression in the field based on the semantic vector representation specifically includes:

对新命题的语义向量表示进行归一化处理；Normalize the semantic vector representation of the new proposition;

计算归一化后的语义向量表示与领域内标准表述向量表示之间的余弦相似度C；Calculate the cosine similarity C between the normalized semantic vector representation and the standard expression vector representation in the field;

如果余弦相似度C小于7，则认为新命题与标准表述不匹配，即当C<0.7时，判定新命题与标准表述存在较大差异；If the cosine similarity C is less than 7, it is considered that the new proposition does not match the standard statement, that is, when C < 0.7, it is judged that there is a large difference between the new proposition and the standard statement;

基于不匹配情况调整相似度S的计算方式；Adjust the calculation method of similarity S based on the mismatch situation;

其中，所述计算归一化后的语义向量表示与领域内标准表述向量表示之间的余弦相似度C具体步骤包括：The specific steps of calculating the cosine similarity C between the normalized semantic vector representation and the standard expression vector representation in the field include:

确定新命题语义向量，表示V1和标准表述向量，表示V2；Determine the semantic vector of the new proposition, denoted V1, and the vector of the standard statement, denoted V2;

计算V1与V2的点积P；Calculate the dot product P of V1 and V2;

计算V1和V2的模长M1和M2；Calculate the modulus lengths M1 and M2 of V1 and V2;

根据公式余弦相似度C=P/(M1×M2)计算余弦相似度。The cosine similarity is calculated according to the formula cosine similarity C=P/(M1×M2).

优选的，所述步骤S2中基于预训练的语言大模型对所述新命题进行语义分析,识别其中的潜在错误的识别过程具体步骤包括：Preferably, the specific steps of performing semantic analysis on the new proposition based on the pre-trained language model in step S2 to identify potential errors therein include:

获取新命题的词法结构信息；Obtaining lexical structure information of new propositions;

基于语言模型计算所述新命题的语法概率得分P（语法）；Calculate the grammatical probability score P(grammar) of the new proposition based on the language model;

判断语法概率得分P（语法）是否低于第一预设阈值θ1；Determine whether the grammar probability score P(grammar) is lower than a first preset threshold θ1;

如果P（语法）<θ1，则识别新命题可能存在语法错误；If P(grammar) < θ1, then the recognition of the new proposition may contain grammatical errors;

其中，所述基于语言模型计算所述新命题的语法概率得分P（语法）的具体步骤包括：The specific steps of calculating the grammatical probability score P(grammar) of the new proposition based on the language model include:

对新命题进行分词处理得到词汇序列；Perform word segmentation on the new proposition to obtain a vocabulary sequence;

基于语言模型计算所述词汇序列的概率得分P（词汇）；Calculate a probability score P(vocabulary) of the vocabulary sequence based on a language model;

计算新命题的上下文相关性得分P（上下文）；Calculate the context relevance score P(context) of the new proposition;

根据公式P（语法）=P（词汇）×P（上下文）计算语法概率得分P（语法）。The grammatical probability score P(grammar) is calculated according to the formula P(grammar) = P(vocabulary) × P(context).

优选的，所述计算新命题的上下文相关性得分P（上下文）的具体步骤包括：Preferably, the specific steps of calculating the context relevance score P(context) of the new proposition include:

获取所述新命题的前后文信息；Obtaining context information of the new proposition;

基于语言模型计算前后文信息的相关性得分P（前）和P（后）；Calculate the relevance scores P(previous) and P(next) of the context information based on the language model;

判断所述相关性得分P（前）和P（后）是否均高于第二预设阈值θ2；Determine whether the correlation scores P(front) and P(back) are both higher than a second preset threshold θ2;

如果P（前）>θ2且P（后）>θ2，则P（上下文）= (P（前）+ P（后）)/2；If P(pre) > θ2 and P(post) > θ2, then P(context) = (P(pre) + P(post))/2;

其中，所述基于语言模型计算前后文信息的相关性得分P（前）和P（后）的具体步骤包括：The specific steps of calculating the relevance scores P(previous) and P(after) of the context information based on the language model include:

提取新命题的前后文关键词；Extract the keywords of the context of the new proposition;

基于语言模型计算所述关键词在前后文中出现的概率P（关键词）；Calculate the probability P(keyword) of the keyword appearing in the context based on the language model;

计算关键词与新命题之间的语义相似度S（语义）；Calculate the semantic similarity S (semantic) between the keyword and the new proposition;

根据公式P（前）/P（后）= P（关键词）×S（语义）计算相关性得分P（前）和P（后）。The relevance scores P(front) and P(back) are calculated according to the formula P(front)/P(back) = P(keyword) × S(semantic).

优选的，所述计算关键词与新命题之间的语义相似度S（语义）的具体步骤包括：Preferably, the specific steps of calculating the semantic similarity S (semantics) between the keyword and the new proposition include:

将关键词和新命题分别向量化表示为向量V关键词和V新命题；The keywords and new propositions are vectorized into vectors V keywords and V new propositions respectively;

计算向量V关键词和V新命题之间的余弦相似度COS；Calculate the cosine similarity COS between the vector V keyword and V new proposition;

如果COS>第三预设阈值θ3，则S（语义）=COS。If COS>the third preset threshold θ3, then S(semantic)=COS.

优选的，所述步骤S3具体包括：Preferably, the step S3 specifically includes:

从预定义的领域特定知识库中提取相关数据，该知识库包含了广泛的事实、规则、逻辑关系以及领域特定的概念。Extract relevant data from a predefined domain-specific knowledge base that contains a wide range of facts, rules, logical relationships, and domain-specific concepts.

设知识库为K，新命题为P，Let the knowledge base be K, the new proposition be P,

对于P中的每个子命题Pi，从K中寻找与Pi语义最接近的命题集合 {k1,k2,…,kn}，For each sub-proposition Pi in P, find the set of propositions {k1, k2, …, kn} in K that is closest to Pi in semantics.

计算命题Pi与K中对应命题Kj的相似度，使用余弦相似度公式：，Calculate the similarity between proposition Pi and the corresponding proposition Kj in K , using the cosine similarity formula: ,

其中，和是对应命题的语义向量表示；in, and is the semantic vector representation of the corresponding proposition;

根据知识库中的规则和逻辑，验证新命题与已知知识的逻辑一致性，对新命题PPP 的整体准确性进行评估，综合考虑各个子命题的相似度和逻辑一致性，得出新命题的置信度分数 C(P)：According to the rules and logic in the knowledge base, the logical consistency of the new proposition and the known knowledge is verified, the overall accuracy of the new proposition PPP is evaluated, and the similarity and logical consistency of each sub-proposition are comprehensively considered to obtain the confidence score C(P) of the new proposition:

， ,

其中，是逻辑一致性得分。in, is the logical consistency score.

优选的，所述步骤S4具体包括：Preferably, the step S4 specifically includes:

基于步骤S3中的相似度和逻辑一致性结果，确定命题中错误的位置，定位每个子命题Pi的错误点，如果小于预设阈值，则认为该子命题存在错误；Based on the similarity and logical consistency results in step S3, determine the location of the error in the proposition and locate the error point of each sub-proposition Pi. If Less than the preset threshold , then the sub-proposition is considered to be wrong;

利用领域特定知识库中的正确命题集合{k1,k2,…,kn}进行错误修正，使用替换策略，选择与Pi相似度最高且逻辑一致性最佳的命题Kj替换Pi中的错误部分，修正后的命题为：Use the correct proposition set {k1, k2, ..., kn} in the domain-specific knowledge base to correct errors, use the replacement strategy, select the proposition Kj with the highest similarity to Pi and the best logical consistency to replace the erroneous part of Pi, and the corrected proposition for:

； ;

将修正后的新命题再次通过知识库验证，确保其语义和逻辑的一致，如果修正后的命题仍不满足条件，则进一步调整，直至命题的置信度达到设定的门槛值。The revised new proposition Verify the knowledge base again to ensure the consistency of its semantics and logic. If the revised proposition still does not meet the conditions, further adjustments are made until the confidence level of the proposition reaches Reach the set threshold.

一种基于大模型新命题纠错系统，采用所述的基于大模型新命题纠错方法，所述系统包括：A new proposition error correction system based on a large model, using the new proposition error correction method based on a large model, the system comprising:

语境信息获取模块，该语境信息获取模块用于获取大模型生成的新命题及其对应的语境信息；A context information acquisition module, which is used to acquire new propositions generated by the large model and their corresponding context information;

语义分析模块，该语义分析模块基于预训练的语言模型，对新命题进行语义分析，识别新命题中的潜在错误；A semantic analysis module, which performs semantic analysis on new propositions based on a pre-trained language model to identify potential errors in new propositions;

验证模块，该验证模块利用领域特定知识库验证新命题的准确性；A verification module that verifies the accuracy of new propositions using a domain-specific knowledge base;

修正模块，该修正模块根据语义分析结果和验证反馈，自动修正新命题中的错误。The correction module automatically corrects errors in new propositions based on semantic analysis results and verification feedback.

由上述技术方案可知，本发明具有如下有益效果：It can be seen from the above technical solution that the present invention has the following beneficial effects:

该基于大模型新命题纠错方法及系统，通过获取生成的新命题及其对应的语境信息，基于预训练的语言大模型对新命题进行语义分析,识别其中的潜在错误，利用领域特定知识库验证所述新命题的准确性，根据分析结果和验证反馈自动修正新命题中的错误，使得新命题纠错，尤其是在处理那些没有明确规则可循的新类型错误时，更加智能和灵活，提高了准确性和效率，解决了在如何开发更加智能和灵活的纠错方法，以提高在处理新命题时的准确性和效率问题上，现有技术虽然已经有一些研究尝试通过增强模型的适应性或结合多种技术手段来提升纠错效果，但仍然存在识别准确性不足、对新类型错误处理能力有限等问题。The method and system for correcting new propositions based on a large model obtain the generated new propositions and their corresponding contextual information, perform semantic analysis on the new propositions based on a pre-trained language large model, identify potential errors therein, verify the accuracy of the new propositions using a domain-specific knowledge base, and automatically correct errors in the new propositions based on the analysis results and verification feedback, so that the correction of new propositions is more intelligent and flexible, especially when dealing with new types of errors that do not have clear rules to follow, and improves accuracy and efficiency. It solves the problem of how to develop a more intelligent and flexible correction method to improve the accuracy and efficiency when dealing with new propositions. Although some studies in the prior art have attempted to improve the correction effect by enhancing the adaptability of the model or combining a variety of technical means, there are still problems such as insufficient recognition accuracy and limited ability to handle new types of errors.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法流程示意图。FIG1 is a schematic flow chart of the method of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

如图1所示，一种基于大模型新命题纠错方法，所述方法包括：As shown in FIG1 , a method for correcting errors in new propositions based on a large model includes:

S1：通过获取生成的新命题及其对应的语境信息，我们首先确保了能够全面理解该命题在特定上下文中的含义。这一过程通常涉及输出中提取出具体的句子或短语，并收集与之相关的背景资料、对话历史等辅助信息。例如，假设生成了一个关于气候变化的新命题：“全球气温在过去十年中下降了5度。”为了更好地理解这个命题，我们需要收集有关时间范围、数据来源、测量标准等细节。这一步骤对于后续的语义分析至关重要，因为它提供了必要的上下文来评估命题的真实性和准确性。此外，在实际应用中，我们可能还需要考虑如何高效地从大量的文本数据中筛选出相关的信息片段，以支持后续的分析工作。S1: By obtaining the generated new proposition and its corresponding contextual information, we first ensure that we can fully understand the meaning of the proposition in a specific context. This process usually involves extracting specific sentences or phrases from the output and collecting related background information, conversation history and other auxiliary information. For example, suppose a new proposition about climate change is generated: "Global temperatures have dropped by 5 degrees in the past decade." In order to better understand this proposition, we need to collect details about the time range, data source, measurement standards, etc. This step is crucial for subsequent semantic analysis because it provides the necessary context to evaluate the truthfulness and accuracy of the proposition. In addition, in practical applications, we may also need to consider how to efficiently filter out relevant information fragments from a large amount of text data to support subsequent analysis.

S2：通过基于预训练的语言大模型对所述新命题进行语义分析，识别其中的潜在错误。在这个阶段，我们利用先进的自然语言处理技术，如BERT、GPT-3等预训练模型，来深入分析新命题的语义结构。这些模型经过大量文本数据的训练，能够捕捉到复杂的语言模式和上下文关系。例如，对于前面提到的关于全球气温变化的新命题，预训练模型可能会发现“过去十年中下降了5度”这一表述与已知的气候变化趋势不符，因为根据现有的科学共识，全球气温实际上是在上升而非下降。这种分析不仅限于事实性的错误，还可以涵盖逻辑不一致、语义模糊等问题。通过这种方式，我们可以更准确地识别出新命题中存在的潜在问题。S2: Perform semantic analysis on the new proposition based on a pre-trained language model to identify potential errors. At this stage, we use advanced natural language processing techniques, such as pre-trained models such as BERT and GPT-3, to deeply analyze the semantic structure of the new proposition. These models are trained on a large amount of text data and are able to capture complex language patterns and contextual relationships. For example, for the new proposition about global temperature change mentioned earlier, the pre-trained model may find that the statement "a drop of 5 degrees in the past decade" is inconsistent with the known trend of climate change, because according to the existing scientific consensus, global temperatures are actually rising rather than falling. This analysis is not limited to factual errors, but can also cover problems such as logical inconsistencies and semantic ambiguity. In this way, we can more accurately identify potential problems in new propositions.

S3：通过利用领域特定知识库验证所述新命题的准确性，我们进一步确保了新命题的可靠性和真实性。领域特定知识库包含了特定领域的专业知识和数据，比如气象学数据库、科学研究成果等。继续以上述气候变化的例子，我们可以查询气象学数据库中关于全球气温变化的数据记录，以验证“过去十年中下降了5度”这一说法是否正确。如果数据库显示全球气温实际上是上升的，那么就可以确定原命题存在错误。此外，还可以利用专家系统或专业文献来进一步确认新命题的准确性。这种方法的优势在于它能够将大模型生成的内容与权威数据源进行比对，从而提高识别错误的精度。S3: By verifying the accuracy of the new proposition using a domain-specific knowledge base, we further ensure the reliability and authenticity of the new proposition. A domain-specific knowledge base contains expertise and data in a specific field, such as a meteorological database, scientific research results, etc. Continuing with the above example of climate change, we can query the data records on global temperature changes in the meteorological database to verify whether the statement "it has dropped by 5 degrees in the past decade" is correct. If the database shows that global temperatures are actually rising, then it can be determined that the original proposition is wrong. In addition, expert systems or professional literature can be used to further confirm the accuracy of the new proposition. The advantage of this approach is that it can compare the content generated by the large model with the authoritative data source, thereby improving the accuracy of identifying errors.

S4：通过根据分析结果和验证反馈自动修正所述新命题中的错误，我们最终实现了对新命题的有效纠错。基于前几步的分析和验证，系统会自动提出修正建议。例如，对于前述关于全球气温变化的新命题，系统可能会建议将其修改为：“全球气温在过去十年中上升了X度。”这里的X度是根据领域特定知识库中的数据计算得出的。此外，修正过程还可能包括对语义模糊或逻辑不一致的部分进行调整，以确保新命题既准确又清晰。通过这种方式，我们不仅能够纠正错误，还能提高新命题的质量，使其更加符合实际情况。S4: By automatically correcting the errors in the new proposition according to the analysis results and verification feedback, we finally achieved effective error correction of the new proposition. Based on the analysis and verification in the previous steps, the system automatically proposes correction suggestions. For example, for the aforementioned new proposition about global temperature change, the system may suggest that it be modified to: "Global temperature has risen by X degrees in the past decade." The X degrees here are calculated based on data in the domain-specific knowledge base. In addition, the correction process may also include adjustments to semantically ambiguous or logically inconsistent parts to ensure that the new proposition is both accurate and clear. In this way, we can not only correct errors, but also improve the quality of new propositions to make them more in line with actual conditions.

通过上述步骤，可解决“如何有效地识别和纠正大模型生成的新命题中的错误”的问题，整个流程从获取新命题及其语境信息开始，通过语义分析识别潜在错误，利用领域知识验证准确性，并最终自动修正错误，形成了一套完整的纠错机制，这种方法不仅提高了新命题的准确性，还增强了生成内容的可靠性，对于提升人工智能系统的整体性能具有重要意义。Through the above steps, the problem of "how to effectively identify and correct errors in new propositions generated by large models" can be solved. The whole process starts with obtaining new propositions and their contextual information, identifying potential errors through semantic analysis, using domain knowledge to verify accuracy, and finally automatically correcting errors, forming a complete error correction mechanism. This method not only improves the accuracy of new propositions, but also enhances the reliability of generated content, which is of great significance for improving the overall performance of artificial intelligence systems.

接下来，描述获取生成的新命题及其对应的语境信息的具体步骤：Next, the specific steps for obtaining the generated new proposition and its corresponding context information are described:

通过API接口接收输出的新命题文本。这一过程涉及与大模型的交互，通常通过调用预先定义好的应用程序编程接口API来实现。具体来说，当生成一段新的命题文本后，该文本将通过API接口传输给纠错系统，以便进一步处理和分析。Receive the output new proposition text through the API interface. This process involves interaction with the large model, which is usually implemented by calling a pre-defined application programming interface API. Specifically, when a new proposition text is generated, the text will be transmitted to the error correction system through the API interface for further processing and analysis.

提取新命题文本中的关键词汇。在接收到新命题文本之后，系统需要对文本内容进行初步解析，以识别出其中的关键信息。这一步骤可以通过多种自然语言处理技术来完成，例如词性标注、命名实体识别等，从而筛选出对理解文本意义至关重要的词汇。Extract key words from the new proposition text. After receiving the new proposition text, the system needs to perform a preliminary analysis of the text content to identify the key information. This step can be completed through a variety of natural language processing technologies, such as part-of-speech tagging, named entity recognition, etc., to screen out words that are crucial to understanding the meaning of the text.

基于关键词汇构建语境上下文环境。为了更准确地理解新命题的意义以及它所处的语境，系统会利用提取到的关键词汇来构建一个上下文环境。这可能涉及到对关键词汇之间的关联性进行分析，并结合相关领域的知识库来增强对文本背景的理解。Build context based on key words. In order to more accurately understand the meaning of the new proposition and the context it is in, the system will use the extracted key words to build a context. This may involve analyzing the correlation between key words and combining the knowledge base of related fields to enhance the understanding of the text background.

利用自然语言处理技术识别新命题中的实体和关系。最后，系统还需要深入分析新命题中的实体信息及其相互间的关系。这一步骤可以借助诸如依存句法分析、语义角色标注等高级自然语言处理技术来完成，目的是为了更全面地理解新命题的内容及其潜在含义。Use natural language processing technology to identify entities and relationships in new propositions. Finally, the system needs to deeply analyze the entity information in the new proposition and their relationships. This step can be completed with the help of advanced natural language processing technologies such as dependency parsing and semantic role labeling, in order to more comprehensively understand the content of the new proposition and its potential meaning.

接下来，描述涉及的基于预训练的语言大模型对所述新命题进行语义分析的这一过程具体步骤：Next, the specific steps of the process of semantically analyzing the new proposition based on the pre-trained language model are described:

通过将新命题输入至预训练语言大模型以获得语义向量表示。这一过程涉及到将待分析的新命题作为输入传递给预先训练好的语言模型。该语言模型通常是在大量文本数据上进行过训练的深度学习模型，能够捕捉到语言中的复杂结构和语义信息。当新命题被输入后，模型会对其进行处理并生成一个或多个人工向量，这些向量可以被视为新命题的语义表示。The semantic vector representation is obtained by inputting the new proposition into a pre-trained language model. This process involves passing the new proposition to be analyzed as input to a pre-trained language model. The language model is usually a deep learning model trained on a large amount of text data, which can capture the complex structure and semantic information in the language. When a new proposition is input, the model processes it and generates one or more artificial vectors, which can be regarded as the semantic representation of the new proposition.

通过基于语义向量表示计算新命题与领域内标准表述之间的相似度S。在获得了新命题的语义向量表示之后，接下来需要计算这个表示与领域内已知的标准表述之间的相似度。这一步骤可以通过多种方式实现，例如使用余弦相似度、欧几里得距离或其他度量方法来量化两个向量之间的相似程度。相似度S的值反映了新命题与标准表述在语义上的接近程度。The similarity S between the new proposition and the standard expression in the field is calculated based on the semantic vector representation. After obtaining the semantic vector representation of the new proposition, the next step is to calculate the similarity between this representation and the known standard expression in the field. This step can be achieved in many ways, such as using cosine similarity, Euclidean distance or other measurement methods to quantify the similarity between two vectors. The value of similarity S reflects the semantic closeness between the new proposition and the standard expression.

如果相似度S低于预设阈值T，则判定新命题可能存在语义错误。在计算出相似度S之后，将其与一个事先设定的阈值T进行比较。如果S小于T，则认为新命题与领域内的标准表述存在较大的语义差异，可能包含语义错误或不准确的表述。这个阈值的选择取决于具体的应用场景和对错误容忍度的要求。If the similarity S is lower than the preset threshold T, the new proposition is judged to have possible semantic errors. After calculating the similarity S, it is compared with a pre-set threshold T. If S is less than T, it is considered that the new proposition has a large semantic difference from the standard expression in the field and may contain semantic errors or inaccurate expressions. The choice of this threshold depends on the specific application scenario and the requirements for error tolerance.

对比新命题与领域内标准表述的关键差异点。一旦确定了新命题可能存在语义错误，接下来就需要进一步分析新命题与标准表述之间的关键差异点。这可以通过对比两者的语义向量表示来完成，找出导致相似度较低的具体原因。例如，可以识别出新命题中的特定词汇或短语与标准表述不一致的地方，并据此提出修改建议以提高新命题的准确性。Compare the key differences between the new proposition and the standard statement in the field. Once it is determined that the new proposition may have semantic errors, the next step is to further analyze the key differences between the new proposition and the standard statement. This can be done by comparing the semantic vector representations of the two to find out the specific reasons for the low similarity. For example, you can identify where specific words or phrases in the new proposition are inconsistent with the standard statement, and make suggestions for changes to improve the accuracy of the new proposition.

接下来，描述基于语义向量表示计算新命题与领域内标准表述之间相似度S的具体步骤：Next, we describe the specific steps of calculating the similarity S between the new proposition and the standard expression in the field based on the semantic vector representation:

首先，对新命题进行语义分析以获得其语义向量表示，并对其进行归一化处理。这一过程确保了不同长度的向量能够在一个统一的标准下进行比较，从而避免了向量长度差异对相似度计算的影响。归一化处理通常采用L2范数或其他标准化方法来完成。First, the new proposition is semantically analyzed to obtain its semantic vector representation, and then normalized. This process ensures that vectors of different lengths can be compared under a unified standard, thereby avoiding the influence of vector length differences on similarity calculations. Normalization is usually done using the L2 norm or other normalization methods.

接着，计算经过归一化处理的新命题语义向量表示与领域内标准表述向量表示之间的余弦相似度C。余弦相似度是一种衡量两个非零向量夹角余弦值的方法，用于评估它们之间的方向相似性。通过计算两个向量点积除以各自模长的乘积得到余弦相似度C。Next, the cosine similarity C between the normalized semantic vector representation of the new proposition and the vector representation of the standard expression in the field is calculated. Cosine similarity is a method to measure the cosine value of the angle between two non-zero vectors, which is used to evaluate the directional similarity between them. The cosine similarity C is obtained by calculating the dot product of the two vectors divided by the product of their respective moduli.

然后，判断余弦相似度C是否小于阈值7，如果余弦相似度C小于7，则认为新命题与标准表述在语义上存在较大差异，即不匹配。需要注意的是，在实际应用中，余弦相似度的取值范围是[-1, 1]，因此这里的阈值7可能是一个笔误或表述错误，正常的阈值应该是在这个范围内选择一个合适的数值。Then, determine whether the cosine similarity C is less than the threshold value 7. If the cosine similarity C is less than 7, it is considered that the new proposition is semantically different from the standard statement, that is, it does not match. It should be noted that in actual applications, the value range of cosine similarity is [-1, 1], so the threshold value 7 here may be a typo or an error in expression. The normal threshold value should be a suitable value within this range.

最后，基于上述不匹配的情况调整相似度S的计算方式。例如，可以引入其他因素如词汇重叠度、句法结构相似度等来综合评估新命题与标准表述之间的相似度S，或者调整余弦相似度的阈值以适应不同的应用场景和需求。Finally, the calculation method of similarity S is adjusted based on the above mismatch. For example, other factors such as vocabulary overlap, syntactic structure similarity, etc. can be introduced to comprehensively evaluate the similarity S between the new proposition and the standard statement, or the threshold of cosine similarity can be adjusted to adapt to different application scenarios and needs.

接下来，描述计算归一化后的语义向量表示与领域内标准表述向量表示之间的余弦相似度C的具体步骤：Next, the specific steps of calculating the cosine similarity C between the normalized semantic vector representation and the standard expression vector representation in the field are described:

确定新命题语义向量，表示“V1”和标准表述向量，表示“V2”；Determine the semantic vector of the new proposition, denoted “V1”, and the vector of the standard statement, denoted “V2”;

在本步骤中，首先需要获取经过预处理和归一化处理后的新命题的语义向量表示V1以及领域内标准表述的向量表示V2。这些向量表示通常是由预先训练好的语言模型生成的，能够捕捉到文本中的语义信息。例如，可以使用BERT、ROBERTA或其他类似的深度学习模型来提取文本的语义特征，并将其转换为固定长度的向量形式。In this step, we first need to obtain the semantic vector representation V1 of the new proposition after preprocessing and normalization, as well as the vector representation V2 of the standard expression in the field. These vector representations are usually generated by a pre-trained language model and can capture the semantic information in the text. For example, BERT, ROBERTA or other similar deep learning models can be used to extract the semantic features of the text and convert them into a fixed-length vector form.

计算V1与V2的点积P；Calculate the dot product P of V1 and V2;

接下来，计算上述得到的两个向量V1和V2之间的点积P。点积是一个数学运算，用于衡量两个向量在相同维度上的元素相乘之后的总和。具体来说，如果V1和V2都是n维向量，则点积P可以通过将V1的第i个元素与V2的第i个元素相乘后求和得到，即：Next, calculate the dot product P between the two vectors V1 and V2 obtained above. The dot product is a mathematical operation that measures the sum of two vectors after multiplying their elements on the same dimension. Specifically, if V1 and V2 are both n-dimensional vectors, the dot product P can be obtained by multiplying the i-th element of V1 with the i-th element of V2 and then summing them, that is:

； ;

其中，V₁（i）表示向量V₁的第i个元素，V₂（i）表示向量V₂的第i个元素，n是向量的维度，×表示元素的乘积，∑表示从第1个到第n个元素的求和。Wherein, V ₁ (i) represents the i-th element of vector V ₁ , V ₂ (i) represents the i-th element of vector V ₂ , n is the dimension of the vector, × represents the product of the elements, and ∑ represents the sum from the 1st to the nth elements.

在计算了点积之后，需要进一步计算两个向量V1和V2各自的模长M1和M2。向量的模长或称为范数是衡量向量长度的一个指标，可以通过对向量各分量的平方求和后再开平方根来获得。对于向量V1，其模长M1计算公式为；After calculating the dot product, we need to further calculate the modulus M1 and M2 of the two vectors V1 and V2. The modulus or norm of a vector is an indicator of the length of the vector, which can be obtained by summing the squares of the components of the vector and then taking the square root. For vector V1, the formula for calculating its modulus M1 is:

； ;

对于向量V2，其模长M2计算公式为。For vector V2, the calculation formula for its modulus M2 is:

； ;

计算余弦相似度C；Calculate the cosine similarity C;

最后一步是计算余弦相似度C，它是通过点积P除以两个向量的模长乘积M1×M2来得到的。余弦相似度C的计算公式为C=P/(M1×M2)。余弦相似度的值范围在-1到1之间，其中1表示两个向量完全相同，0表示两个向量正交即没有相关性，而-1则表示两个向量方向相反。通过这种方式，我们可以量化新命题与领域内标准表述之间的语义相似程度。The last step is to calculate the cosine similarity C, which is obtained by dividing the dot product P by the product of the modulus lengths of the two vectors M1×M2. The formula for calculating cosine similarity C is C=P/(M1×M2). The value range of cosine similarity is between -1 and 1, where 1 means that the two vectors are exactly the same, 0 means that the two vectors are orthogonal, that is, there is no correlation, and -1 means that the two vectors are in opposite directions. In this way, we can quantify the degree of semantic similarity between the new proposition and the standard statement in the field.

接下来，描述关于余弦相似度判断新命题与标准表述匹配程度的具体步骤：Next, we describe the specific steps of judging the matching degree between the new proposition and the standard statement by cosine similarity:

通过计算新命题与标准表述之间的余弦相似度：首先，将新命题和标准表述分别转换为向量表示形式，这可以通过预训练的语义模型来完成。接着，利用这些向量计算两者之间的余弦相似度C。余弦相似度是一种衡量两个非零向量之间角度的度量，其值范围在-1到1之间，值越接近1表示两向量越相似。By calculating the cosine similarity between the new proposition and the standard statement: First, convert the new proposition and the standard statement into vector representation, which can be done through a pre-trained semantic model. Then, use these vectors to calculate the cosine similarity C between the two. Cosine similarity is a measure of the angle between two non-zero vectors, and its value range is between -1 and 1. The closer the value is to 1, the more similar the two vectors are.

通过设定阈值来判断新命题与标准表述的匹配程度：当计算出的余弦相似度C小于0.7时即C<0.7，则认为新命题与标准表述存在较大差异，即不匹配。这意味着新命题与标准表述在语义上存在显著的不同，可能是因为新命题包含了一些错误或者不准确的信息。The matching degree between the new proposition and the standard statement is judged by setting a threshold: when the calculated cosine similarity C is less than 0.7, that is, C<0.7, it is considered that there is a large difference between the new proposition and the standard statement, that is, it does not match. This means that there is a significant difference in semantics between the new proposition and the standard statement, which may be because the new proposition contains some wrong or inaccurate information.

根据判断结果采取相应的处理措施：一旦确定新命题与标准表述存在较大差异，系统可以进一步分析新命题中的具体问题，并提供修正建议或直接给出正确的标准表述作为参考，以帮助用户理解并纠正新命题中的错误。Take appropriate measures based on the judgment results: Once it is determined that there are significant differences between the new proposition and the standard statement, the system can further analyze the specific problems in the new proposition and provide correction suggestions or directly give the correct standard statement as a reference to help users understand and correct errors in the new proposition.

接下来，描述利用预训练语言大模型进行语义分析以识别潜在错误中识别过程的具体步骤：Next, we describe the specific steps of the recognition process in which semantic analysis is performed using a pre-trained language model to identify potential errors:

获取新命题的词法结构信息，在这一阶段，首先接收输入的新命题，并对其进行初步处理以提取出构成该命题的基本词汇单元及其在句子中的位置和作用。这一步骤通常涉及使用自然语言处理技术来解析句子结构，例如通过依存句法分析或成分句法分析等方法来确定各个词汇之间的关系以及它们在句子中的功能角色。Obtaining the lexical structure information of the new proposition. In this stage, the new proposition is first received as input and processed to extract the basic lexical units that constitute the proposition and their position and role in the sentence. This step usually involves using natural language processing technology to parse the sentence structure, such as using dependency parsing or component parsing to determine the relationship between individual words and their functional roles in the sentence.

基于语言模型计算新命题的语法概率得分P（语法），接下来，利用预先训练好的语言模型来评估整个新命题的语法正确性。这个过程涉及到将新命题传递给语言模型，由模型根据其内部学习到的概率分布来计算该命题作为合法句子的可能性。所得出的语法概率得分P（语法）反映了命题在语法层面上的合理性。The grammatical probability score P(grammar) of the new proposition is calculated based on the language model. Next, the pre-trained language model is used to evaluate the grammatical correctness of the entire new proposition. This process involves passing the new proposition to the language model, which calculates the possibility of the proposition being a legal sentence based on the probability distribution learned internally. The resulting grammatical probability score P(grammar) reflects the rationality of the proposition at the grammatical level.

判断语法概率得分P（语法）是否低于第一预设阈值θ1 - 系统随后会检查计算得到的P（语法）值是否低于事先设定的一个阈值θ1。如果P（语法）小于θ1，则表明该新命题在语法层面可能存在错误。阈值的选择需要根据具体应用场景和语言模型的能力来确定，以确保既能有效检测出错误又能避免误报。Determine whether the grammatical probability score P(grammar) is lower than the first preset threshold θ1 - The system then checks whether the calculated P(grammar) value is lower than a pre-set threshold θ1. If P(grammar) is less than θ1, it indicates that the new proposition may have errors at the grammatical level. The choice of threshold needs to be determined based on the specific application scenario and the capabilities of the language model to ensure that errors can be effectively detected while avoiding false positives.

如果P（语法）<θ1，则识别新命题可能存在语法错误，当P（语法）确实低于θ1时，系统会标记该新命题为可能含有语法错误的句子。此时，可以进一步采取措施来修正这些错误，比如提供修改建议或者直接应用自动更正算法来改进句子的语法结构。If P(grammar) < θ1, the new proposition is identified as a sentence that may contain grammatical errors. When P(grammar) is indeed lower than θ1, the system will mark the new proposition as a sentence that may contain grammatical errors. At this point, further measures can be taken to correct these errors, such as providing modification suggestions or directly applying automatic correction algorithms to improve the grammatical structure of the sentence.

通过上述步骤，本发明能够有效地利用预训练语言模型的强大能力来识别并纠正新命题中的语法错误，从而提高文本的整体质量。Through the above steps, the present invention can effectively utilize the powerful ability of the pre-trained language model to identify and correct grammatical errors in new propositions, thereby improving the overall quality of the text.

接下来，描述基于语言模型计算新命题的语法概率得分P（语法）的具体步骤：Next, we describe the specific steps for calculating the grammatical probability score P(grammar) of a new proposition based on the language model:

通过将所述新命题进行分词处理以获得构成该命题的一系列词汇，即词汇序列。这一过程利用了自然语言处理中的分词技术，确保能够准确地识别出新命题中的各个词汇单元，为后续的概率计算提供基础。The new proposition is segmented to obtain a series of words constituting the proposition, namely, a word sequence. This process utilizes the word segmentation technology in natural language processing to ensure that each word unit in the new proposition can be accurately identified, providing a basis for subsequent probability calculations.

通过应用预先训练好的语言模型来计算上述词汇序列的概率得分P（词汇）。这一步骤中，语言模型根据词汇序列中各词汇出现的可能性以及它们之间的组合关系来评估整个序列的概率得分，反映了词汇序列在语言学上的合理性。The probability score P(vocabulary) of the above word sequence is calculated by applying a pre-trained language model. In this step, the language model evaluates the probability score of the entire sequence based on the probability of each word appearing in the word sequence and the combination relationship between them, reflecting the linguistic rationality of the word sequence.

通过分析新命题与上下文之间的关联程度来计算上下文相关性得分P（上下文）。具体而言，可以采用诸如注意力机制等技术来衡量新命题与前后文之间的语义连贯性和一致性，从而量化其在特定上下文中是否合理。The context relevance score P(context) is calculated by analyzing the degree of association between the new proposition and the context. Specifically, techniques such as attention mechanisms can be used to measure the semantic coherence and consistency between the new proposition and the context, thereby quantifying whether it is reasonable in a specific context.

最后，通过公式P（语法）= P（词汇）× P（上下文）来综合计算新命题的语法概率得分P（语法）。这意味着新命题的语法正确性不仅取决于其内部词汇序列的合理性P（词汇），还受到其与上下文环境相匹配的程度P（上下文）的影响。这种综合评估方式有助于更全面地判断新命题的语法质量。Finally, the grammatical probability score P(grammar) of the new proposition is calculated comprehensively by the formula P(grammar) = P(vocabulary) × P(context). This means that the grammatical correctness of the new proposition depends not only on the rationality of its internal vocabulary sequence P(vocabulary), but also on the degree to which it matches the context P(context). This comprehensive evaluation method helps to judge the grammatical quality of the new proposition more comprehensively.

接下来，描述计算新命题的上下文相关性得分P（上下文）的具体步骤：Next, we describe the specific steps for calculating the contextual relevance score P(context) of a new proposition:

通过获取待处理的新命题的前后文信息来开始这一过程。这一步骤涉及识别与新命题直接相邻的文本片段，这些文本片段构成了新命题的上下文环境。例如，在一段连续的文本中，如果新命题是句子"B"，那么句子"A"作为前文和句子"C"作为后文将被提取出来以供后续分析。The process begins by obtaining the contextual information of the new proposition to be processed. This step involves identifying the text fragments directly adjacent to the new proposition, which constitute the context of the new proposition. For example, in a continuous text, if the new proposition is sentence "B", then sentence "A" as the preceding context and sentence "C" as the following context will be extracted for subsequent analysis.

通过利用预先训练好的语言模型来计算前后文信息的相关性得分P（前）和P（后）。具体来说，语言模型会评估句子"A"与句子"B"之间的语义连贯性和句子"C"与句子"B"之间的语义连贯性，并分别给出一个数值表示这种连贯性的程度。例如，可以使用诸如BERT或GPT等先进的语言模型来进行此类计算。The relevance scores P(previous) and P(next) of the context information are calculated by using a pre-trained language model. Specifically, the language model evaluates the semantic coherence between sentence "A" and sentence "B" and the semantic coherence between sentence "C" and sentence "B", and gives a numerical value to indicate the degree of such coherence. For example, advanced language models such as BERT or GPT can be used for such calculations.

接下来判断所得到的相关性得分P（前）和P（后）是否均高于第二预设阈值θ2。这一阈值是用来衡量前后文与新命题之间相关性的最低标准。如果两个得分都超过了这个阈值，则认为新命题与其上下文具有良好的一致性；反之，则认为一致性较差。例如，假设θ2设定为0.6，如果P（前） = 0.75且P（后）= 0.80，则满足条件。Next, determine whether the obtained relevance scores P(pre) and P(post) are both higher than the second preset threshold θ2. This threshold is the minimum standard for measuring the relevance between the context and the new proposition. If both scores exceed this threshold, the new proposition is considered to have good consistency with its context; otherwise, it is considered to have poor consistency. For example, assuming that θ2 is set to 0.6, if P(pre) = 0.75 and P(post) = 0.80, the condition is met.

如果P（前）>θ2且P（后）>θ2，则计算新命题的上下文相关性得分P（上下文），即P（上下文）= P（前）+ P（后）\2。这意味着新命题与前后文的相关性得分将被平均化，以得出一个综合得分，反映新命题在整体上下文中的一致性水平。例如，如果P（前）= 0.75且P（后）=0.80，则P（上下文）=0.75+0.80/2=0.775。If P(pre)>θ2 and P(post)>θ2, then calculate the contextual relevance score P(context) of the new proposition, that is, P(context) = P(pre) + P(post)\2. This means that the relevance scores of the new proposition with the context before and after will be averaged to obtain a comprehensive score that reflects the consistency level of the new proposition in the overall context. For example, if P(pre) = 0.75 and P(post) = 0.80, then P(context) = 0.75 + 0.80/2 = 0.775.

接下来，描述计算前后文信息的相关性得分P（前）和P（后）的具体步骤：Next, the specific steps for calculating the relevance scores P(previous) and P(after) of the context information are described:

首先，提取所述新命题的前后文关键词。这一过程涉及对新命题及其上下文进行分析，以识别出与新命题紧密相关的词汇或短语。这些关键词可以是名词、动词或其他能够反映上下文意义的关键元素。例如，如果新命题是“太阳系中最大的行星是木星”，那么关键词可能包括“太阳系”、“最大”和“木星”。First, extract the contextual keywords of the new proposition. This process involves analyzing the new proposition and its context to identify words or phrases that are closely related to the new proposition. These keywords can be nouns, verbs, or other key elements that reflect the meaning of the context. For example, if the new proposition is "The largest planet in the solar system is Jupiter", then the keywords may include "solar system", "largest", and "Jupiter".

接着，基于所述语言模型计算所述关键词在前后文中出现的概率P（关键词）。这一步骤利用预先训练的语言模型来评估关键词在给定上下文中出现的可能性。例如，在上述例子中，语言模型可能会给出“太阳系”出现的概率较高，而“最大”和“木星”的概率则根据它们在上下文中出现的频率来确定。Next, the probability P(keyword) of the keyword appearing in the context is calculated based on the language model. This step uses a pre-trained language model to evaluate the likelihood of a keyword appearing in a given context. For example, in the above example, the language model may give a high probability for "solar system", while the probabilities of "largest" and "Jupiter" are determined based on how often they appear in the context.

然后，计算所述关键词与所述新命题之间的语义相似度S（语义）。这一步骤通常涉及到使用语义相似度算法来衡量关键词与新命题之间的关系强度。例如，可以使用余弦相似度等方法来量化关键词与新命题之间的语义关联程度。Then, the semantic similarity S (semantic) between the keyword and the new proposition is calculated. This step usually involves using a semantic similarity algorithm to measure the strength of the relationship between the keyword and the new proposition. For example, methods such as cosine similarity can be used to quantify the degree of semantic association between the keyword and the new proposition.

最后，根据公式P（前）/P（后）后 = P（关键词）×S（语义）计算所述相关性得分P（前）和P（后）。这意味着将关键词在上下文中出现的概率与其与新命题的语义相似度相乘，得到一个综合得分，用于表示关键词与新命题在上下文中的相关性。例如，如果“太阳系”在上下文中出现的概率很高，并且与新命题有很强的语义关联，则它将获得较高的相关性得分。Finally, the relevance scores P(front) and P(back) are calculated according to the formula P(front)/P(back) = P(keyword) × S(semantic). This means that the probability of the keyword appearing in the context is multiplied by its semantic similarity to the new proposition to obtain a comprehensive score that represents the relevance of the keyword to the new proposition in the context. For example, if "solar system" has a high probability of appearing in the context and has a strong semantic association with the new proposition, it will receive a high relevance score.

接下来，描述计算关键词与新命题之间语义相似度S（语义）的具体步骤：Next, the specific steps of calculating the semantic similarity S(semantic) between the keyword and the new proposition are described:

通过将关键词和新命题分别向量化表示为向量V关键词和V新命题。这一过程通常涉及利用预先训练好的语言模型或词嵌入技术，将文本中的关键词和新命题转换成数值化的向量形式。例如，可以使用Word2Vec、GloVe或者更先进的BERT等模型来获取这些向量。这些模型能够捕捉到词汇在不同上下文中所携带的意义，并将其映射到一个多维空间中，使得语义相近的词语在该空间中距离较近。The keywords and new propositions are represented by vectors Vkeywords and Vnewpropositions respectively. This process usually involves using a pre-trained language model or word embedding technology to convert the keywords and new propositions in the text into numerical vector form. For example, these vectors can be obtained using models such as Word2Vec, GloVe, or more advanced BERT. These models can capture the meaning of words in different contexts and map them into a multidimensional space so that words with similar semantics are close to each other in this space.

计算向量V关键词和V新命题之间的余弦相似度COS。余弦相似度是一种衡量两个非零向量之间角度的度量方式，用于评估它们方向的一致性。具体而言，可以通过公式COS= V关键词 · V新命题\||V关键词||×||V新命题||来计算，其中·表示向量点乘运算，||V关键词||和||V新命题||分别代表两个向量的模即长度。这样得到的COS值范围在-1到1之间，值越接近1表示两个向量越相似。Calculate the cosine similarity COS between vectors V keyword and V new proposition. Cosine similarity is a measure of the angle between two non-zero vectors, used to evaluate the consistency of their directions. Specifically, it can be calculated by the formula COS = V keyword · V new proposition\||V keyword||×||V new proposition||, where · represents the vector dot multiplication operation, ||V keyword|| and ||V new proposition|| represent the modulus or length of the two vectors respectively. The COS value obtained in this way ranges from -1 to 1. The closer the value is to 1, the more similar the two vectors are.

如果COS>第三预设阈值θ3，则S（语义） = COS。这里引入了一个阈值θ3作为判断标准，只有当计算出的余弦相似度COS大于这个阈值时，才认为关键词与新命题之间存在较高的语义相似度，并将COS赋值给S（语义）。例如，假设θ3被设定为0.7，那么只有当COS大于0.7时，才会认为关键词与新命题在语义上有较强的相关性。这种设定有助于过滤掉那些语义差异较大的关键词组合，从而提高新命题纠错方法的有效性和准确性。If COS> the third preset threshold θ3, then S (semantics) = COS. A threshold θ3 is introduced here as a judgment criterion. Only when the calculated cosine similarity COS is greater than this threshold, it is considered that there is a high semantic similarity between the keyword and the new proposition, and COS is assigned to S (semantics). For example, assuming that θ3 is set to 0.7, then only when COS is greater than 0.7, it is considered that the keyword and the new proposition have a strong semantic correlation. This setting helps to filter out keyword combinations with large semantic differences, thereby improving the effectiveness and accuracy of the new proposition error correction method.

在实际操作过程中，当此装置使用时，首先通过输入接口接收来自大模型生成的新命题及其相关的语境信息。这些信息随后被传递给语言模型模块，该模块基于预训练的语言模型对新命题进行深入的语义分析，以识别其中可能存在的语法或逻辑上的潜在错误。与此同时，领域特定知识库作为另一个关键组件，它包含了与特定领域相关的专业知识和数据，用于进一步验证新命题的准确性。一旦语言模型模块完成了初步的语义分析并将潜在错误标记出来后，这些信息会被发送到知识验证模块，该模块利用领域特定知识库中的数据来确认或否定这些潜在错误。如果发现新命题中确实存在错误，系统会将这些错误连同原始的新命题一起提交给纠错算法模块。纠错算法模块根据接收到的信息以及先前的分析结果和验证反馈，自动修正新命题中的错误，并生成修正后的命题。最后，修正完成的新命题被输出，整个过程实现了从接收新命题到输出准确无误的命题的自动化流程，确保了大模型生成内容的质量和准确性。这一系列步骤不仅提高了处理效率，还保证了最终输出内容的专业性和可靠性。In actual operation, when the device is used, it first receives new propositions generated by the large model and their related contextual information through the input interface. This information is then passed to the language model module, which performs in-depth semantic analysis on the new proposition based on the pre-trained language model to identify potential grammatical or logical errors that may exist in it. At the same time, the domain-specific knowledge base, as another key component, contains professional knowledge and data related to a specific field, which is used to further verify the accuracy of the new proposition. Once the language model module completes the preliminary semantic analysis and marks the potential errors, this information will be sent to the knowledge verification module, which uses the data in the domain-specific knowledge base to confirm or deny these potential errors. If errors are found in the new proposition, the system will submit these errors together with the original new proposition to the error correction algorithm module. The error correction algorithm module automatically corrects the errors in the new proposition based on the received information and the previous analysis results and verification feedback, and generates a corrected proposition. Finally, the corrected new proposition is output. The whole process realizes an automated process from receiving new propositions to outputting accurate propositions, ensuring the quality and accuracy of the content generated by the large model. This series of steps not only improves processing efficiency, but also ensures the professionalism and reliability of the final output content.

所述步骤S3具体包括：The step S3 specifically includes:

， ,

其中，是逻辑一致性得分。in, is the logical consistency score.

在这一实施方式中，纠错方法通过利用领域特定知识库中的数据，对新命题中的每个子命题进行分析。首先，通过从知识库K中提取与每个子命题Pi最接近的命题集合{k1,k2,…,kn}，计算子命题Pi与知识库命题Kj之间的相似度。这一相似度使用余弦相似度公式来计算，确保了对命题之间语义关系的准确量化。接下来，方法基于预定义的规则和逻辑，验证新命题与知识库中现有知识的逻辑一致性。通过结合子命题的语义相似度和逻辑一致性，方法能够计算出新命题的置信度分数C(P)，该分数反映了新命题的整体准确性和与知识库中已知信息的一致性。In this embodiment, the error correction method analyzes each sub-proposition in the new proposition by utilizing data in a domain-specific knowledge base. First, the similarity between the sub-proposition Pi and the knowledge base proposition Kj is calculated by extracting the set of propositions {k1, k2, …, kn} closest to each sub-proposition Pi from the knowledge base K. This similarity is calculated using the cosine similarity formula to ensure accurate quantification of the semantic relationship between propositions. Next, the method verifies the logical consistency of the new proposition with the existing knowledge in the knowledge base based on predefined rules and logic. By combining the semantic similarity and logical consistency of the sub-propositions, the method is able to calculate the confidence score C(P) of the new proposition, which reflects the overall accuracy of the new proposition and its consistency with the known information in the knowledge base.

该实施方式通过将新命题的每个子命题与领域特定知识库中的命题进行比较，有效提高了命题的验证精度。通过计算子命题的相似度和逻辑一致性，该方法能够综合评估新命题的整体准确性，并得出可靠的置信度分数。此过程确保新命题在逻辑和语义上的一致性，从而提升了纠错过程的准确性和可靠性。This implementation effectively improves the verification accuracy of the new proposition by comparing each sub-proposition of the new proposition with the propositions in the domain-specific knowledge base. By calculating the similarity and logical consistency of the sub-propositions, the method can comprehensively evaluate the overall accuracy of the new proposition and derive a reliable confidence score. This process ensures the logical and semantic consistency of the new proposition, thereby improving the accuracy and reliability of the error correction process.

在本实施方式中，可以选择不同类型的相似度计算方法，例如欧氏距离、杰卡德相似系数等，以替代余弦相似度，依据不同应用场景下的需求进行调整。此外，知识库K的构建方式也可以变更，例如引入更多领域特定的语料库或者增强知识库的动态更新能力，以适应不断变化的知识体系。最终的置信度分数C(P)的计算方式也可以通过不同的逻辑规则或权重调整来优化，以进一步提升纠错的准确性。In this embodiment, different types of similarity calculation methods, such as Euclidean distance, Jaccard similarity coefficient, etc., can be selected to replace cosine similarity and adjusted according to the needs of different application scenarios. In addition, the construction method of the knowledge base K can also be changed, such as introducing more field-specific corpora or enhancing the dynamic update capability of the knowledge base to adapt to the ever-changing knowledge system. The calculation method of the final confidence score C(P) can also be optimized through different logical rules or weight adjustments to further improve the accuracy of error correction.

所述步骤S4具体包括：The step S4 specifically includes:

； ;

在这一实施方式中，纠错方法首先通过前一步骤中获得的相似度和逻辑一致性结果，识别出新命题中的错误位置。当子命题Pi的相似度低于预设阈值时，系统会认定该子命题可能包含错误。接下来，方法使用替换策略，选择知识库K中的正确命题集合中与Pi最相似且逻辑一致性最佳的命题Kj进行替换，生成修正后的命题P'。之后，将修正后的命题P'再次通过知识库验证，以确保其在语义和逻辑上的一致性。如果修正后的命题仍未达到预设的置信度要求，将进一步进行调整，直到修正后的命题满足条件，该实施方式能够有效定位并修正命题中的错误，通过结合相似度和逻辑一致性两方面的分析，确保了替换策略的有效性。修正后的命题在经过再次验证后，能够保证其语义和逻辑的一致性，从而提高了纠错的准确性和效率。通过这一过程，能够大幅减少错误命题对后续推理和决策的影响，提高知识库系统的整体可靠性。In this embodiment, the error correction method first identifies the error position in the new proposition through the similarity and logical consistency results obtained in the previous step. When the similarity of the sub-proposition Pi is lower than the preset threshold, the system will determine that the sub-proposition may contain errors. Next, the method uses a replacement strategy to select the proposition Kj that is most similar to Pi and has the best logical consistency in the correct proposition set in the knowledge base K for replacement, and generates a corrected proposition P'. After that, the corrected proposition P' is verified again through the knowledge base to ensure its semantic and logical consistency. If the corrected proposition still does not meet the preset confidence requirements, it will be further adjusted until the corrected proposition meets the conditions. This embodiment can effectively locate and correct errors in the proposition, and ensures the effectiveness of the replacement strategy by combining the analysis of similarity and logical consistency. After the revised proposition is verified again, its semantic and logical consistency can be guaranteed, thereby improving the accuracy and efficiency of error correction. Through this process, the impact of erroneous propositions on subsequent reasoning and decision-making can be greatly reduced, and the overall reliability of the knowledge base system can be improved.

在该实施方式中，可以根据不同应用场景，调整预设的相似度阈值或逻辑一致性标准，从而适应不同的纠错需求。此外，替换策略也可以引入更多的智能化调整机制，例如通过机器学习算法动态优化相似度计算和逻辑一致性分析的权重，进一步提升修正过程的准确性和效率。如果知识库K具有自学习功能，可以在错误修正过程中自动更新命题集合，提高系统对新命题的处理能力。In this implementation, the preset similarity threshold or logical consistency standard can be adjusted according to different application scenarios to adapt to different error correction requirements. In addition, the replacement strategy can also introduce more intelligent adjustment mechanisms, such as dynamically optimizing the weights of similarity calculation and logical consistency analysis through machine learning algorithms to further improve the accuracy and efficiency of the correction process. If the knowledge base K has a self-learning function, the proposition set can be automatically updated during the error correction process to improve the system's ability to handle new propositions.

还提供一种基于大模型新命题纠错系统，用于实现所述的基于大模型新命题纠错方法的步骤，所述系统包括：A system for correcting errors of new propositions based on a large model is also provided, which is used to implement the steps of the method for correcting errors of new propositions based on a large model, and the system comprises:

语境信息获取模块用于接收和处理由大模型生成的新命题及其相关语境信息。该模块通过对话或文本生成系统捕捉大模型输出的语句，并提取相关的上下文，以确保后续处理的准确性。The context information acquisition module is used to receive and process new propositions and their related context information generated by the big model. This module captures the sentences output by the big model through a dialogue or text generation system and extracts the relevant context to ensure the accuracy of subsequent processing.

语义分析模块通过预训练的语言模型对新命题进行深度语义分析。这一模块识别新命题中的潜在错误，具体包括但不限于语法错误、逻辑矛盾或语义不一致等。语义分析模块的核心在于利用语言模型的上下文理解能力，对新命题的语义进行全面分析，以确保其在逻辑和语法上的完整性。The semantic analysis module performs deep semantic analysis on new propositions through pre-trained language models. This module identifies potential errors in new propositions, including but not limited to grammatical errors, logical contradictions, or semantic inconsistencies. The core of the semantic analysis module is to use the contextual understanding ability of the language model to conduct a comprehensive analysis of the semantics of the new proposition to ensure its logical and grammatical integrity.

验证模块则通过调用领域特定的知识库，对新命题的内容进行验证。这些知识库可能包含领域内公认的标准、规则或已知的事实信息。该模块负责检测新命题是否与领域内的已知知识相一致，并标记出潜在的不准确之处。The verification module verifies the content of the new proposition by calling domain-specific knowledge bases. These knowledge bases may contain recognized standards, rules, or known facts in the domain. This module is responsible for checking whether the new proposition is consistent with the known knowledge in the domain and marking potential inaccuracies.

修正模块基于语义分析结果和验证反馈，自动对新命题中的错误进行修正。修正的过程依赖于模型对语义的理解，并结合领域知识库中验证的结果，从而生成一个经过纠错的、符合语境和领域知识的新命题。语境信息获取模块还可以采用不同的获取策略，例如通过实时监控多种来源（如文档生成系统、对话代理）来收集语境信息。此外，语义分析模块可以根据不同应用场景采用不同的预训练语言模型，以适应特定的语言风格或领域要求。验证模块也可以结合多种知识库，或者在缺乏知识库支持时使用外部API进行验证。修正模块还可以结合人工干预，允许用户在自动修正前对建议进行审阅和确认。The correction module automatically corrects errors in new propositions based on semantic analysis results and verification feedback. The correction process relies on the model's understanding of semantics and combines the results of verification in the domain knowledge base to generate a new proposition that is corrected and consistent with the context and domain knowledge. The context information acquisition module can also adopt different acquisition strategies, such as collecting context information through real-time monitoring of multiple sources (such as document generation systems, dialogue agents). In addition, the semantic analysis module can use different pre-trained language models according to different application scenarios to adapt to specific language styles or domain requirements. The verification module can also combine multiple knowledge bases, or use external APIs for verification when there is a lack of knowledge base support. The correction module can also combine manual intervention to allow users to review and confirm suggestions before automatic correction.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. A new proposition error correction method based on a large model, the method comprising:

S1: acquiring the generated new proposition and the corresponding context information thereof;

s2: semantic analysis is carried out on the new propositions based on the pre-trained language big model, and potential errors in the new propositions are identified;

S3: verifying the accuracy of the new proposition by using a domain-specific knowledge base;

s4: and automatically correcting errors in the new propositions according to the analysis result and verification feedback.

2. The method for correcting new propositions based on large models according to claim 1, wherein the step S1 of obtaining the generated new propositions and the context information corresponding to the new propositions specifically comprises:

receiving the output new proposition text through an API interface;

extracting key words in the new proposition text;

Building a context based on the key words;

entities and relationships in the new proposition are identified using natural language processing techniques.

3. The method for correcting new propositions based on large models according to claim 1, wherein said step S2 of performing semantic analysis on said new propositions based on pre-trained language large models specifically comprises:

Inputting the new proposition to the pre-trained language big model to obtain a semantic vector representation;

Calculating the similarity S between the new proposition and the standard expression in the field based on the semantic vector representation;

if the similarity S is lower than a preset threshold T, determining that the new proposition possibly has semantic errors;

And comparing the new propositions with key difference points of standard expressions in the field.

4. A new proposition error correction method based on a big model according to claim 3, wherein said calculating the similarity S between the new proposition and the in-domain standard expression based on the semantic vector representation specifically comprises:

Normalizing the semantic vector representation of the new proposition;

calculating cosine similarity C between the normalized semantic vector representation and the intra-domain standard expression vector representation;

if the cosine similarity C is smaller than 7, the new proposition is not matched with the standard expression, namely when C is smaller than 0.7, the new proposition is judged to have larger difference with the standard expression;

adjusting the calculation mode of the similarity S based on the mismatch condition;

The concrete steps of calculating the cosine similarity C between the normalized semantic vector representation and the standard expression vector representation in the field include:

Determining a new proposition semantic vector, representing V1 and a standard expression vector, and representing V2;

calculating the dot product P of V1 and V2;

Calculating the module lengths M1 and M2 of V1 and V2;

Cosine similarity is calculated according to the formula cosine similarity c=p/(m1×m2).

5. The method for correcting new propositions based on large models according to claim 1, wherein the step S2 of performing semantic analysis on the new propositions based on the pre-trained language large models comprises the following specific steps:

Acquiring lexical structure information of new propositions;

Calculating a grammar probability score P (grammar) of the new proposition based on a language model;

judging whether the grammar probability score P (grammar) is lower than a first preset threshold value theta 1 or not;

If P (grammar) < θ1, identifying that a new proposition may have a grammar error;

Wherein, the specific steps of calculating the grammar probability score P (grammar) of the new proposition based on the language model comprise:

word segmentation processing is carried out on the new proposition to obtain a vocabulary sequence;

calculating a probability score P (vocabulary) of the vocabulary sequence based on the language model;

Calculating a context correlation score P (context) for the new proposition;

the grammar probability score P (grammar) is calculated according to the formula P (grammar) =p (vocabulary) ×p (context).

6. The method for error correction of new propositions based on large models according to claim 5, wherein said specific step of calculating a context correlation score P (context) of a new proposition comprises:

Acquiring the context information of the new proposition;

Calculating relevance scores P (front) and P (rear) of the context information based on the language model;

judging whether the correlation scores P (front) and P (rear) are higher than a second preset threshold value theta 2 or not;

If P (front) > θ2 and P (rear) > θ2, then P (context) = (P (front) +p (rear))/2;

Wherein, the specific steps of calculating the relevance scores P (front) and P (back) of the context information based on the language model comprise:

extracting the context keywords of the new proposition;

calculating the probability P (key words) of the key words occurring in the context based on a language model;

calculating semantic similarity S (semantics) between the keywords and the new propositions;

The correlation scores P (front) and P (rear) are calculated according to the formula P (front)/P (rear) =p (keyword) ×s (semantic).

7. The new proposition error correction method based on the big model according to claim 6, wherein the specific step of calculating the semantic similarity S (semantics) between the keyword and the new proposition comprises:

the keyword and the new proposition are respectively and vectorized to be represented as a vector V keyword and a vector V new proposition;

Calculating cosine similarity COS between the vector V key words and the V new proposition;

If COS > the third preset threshold θ3, S (semantic) =cos.

8. The method for error correction of new propositions based on large models according to claim 1, wherein said step S3 specifically comprises:

Extracting relevant data from a predefined domain-specific knowledge base containing a wide range of facts, rules, logical relationships, and domain-specific concepts;

let the knowledge base be K, the new topic be P,

For each sub-proposition Pi in P, find the proposition set K1, K2, …, kn closest to Pi semantics from K,

Calculating the similarity of corresponding propositions Kj in propositions Pi and KThe cosine similarity formula is used:，

Wherein, AndIs a semantic vector representation of the corresponding proposition;

Verifying the logic consistency of the new proposition and the known knowledge according to rules and logic in a knowledge base, evaluating the overall accuracy of the new proposition PPP, comprehensively considering the similarity and logic consistency of all sub propositions, and obtaining the confidence score C (P) of the new proposition:

，

Wherein, Is a logical consistency score.

9. The method for error correction of new propositions based on large models according to claim 1, wherein said step S4 specifically comprises:

determining the position of the error in the proposition based on the similarity and the logical consistency result in the step S3, locating the error point of each sub-proposition Pi if Less than a preset thresholdThen consider that the sub-proposition has an error;

error correction is carried out by utilizing the correct proposition sets { k1, k2, …, kn } in the domain-specific knowledge base, a replacement strategy is used for selecting the proposition Kj with highest similarity with Pi and best logic consistency to replace the error part in the Pi, and the proposition after correction The method comprises the following steps:

；

new propositions to be corrected Again through knowledge base verification, ensuring the consistency of the semantics and logic, and if the revised proposition still does not meet the conditions, further adjusting until the confidence level of the propositionReaching the set threshold value.

10. A big model based new proposition error correction system for implementing the steps of the big model based new proposition error correction method according to any of claims 1-9, characterized in that the system comprises:

the context information acquisition module is used for acquiring the generated new propositions and the corresponding context information thereof;

the semantic analysis module is used for carrying out semantic analysis on the new proposition based on the pre-trained language big model and identifying potential errors in the new proposition;

the verification module utilizes the domain-specific knowledge base to verify the accuracy of the new proposition;

And the correction module automatically corrects errors in the new propositions according to the semantic analysis result and verification feedback.