[go: up one dir, main page]

CN118468899A - Translation method and device for example-aware machine translation large language model - Google Patents

Translation method and device for example-aware machine translation large language model Download PDF

Info

Publication number
CN118468899A
CN118468899A CN202410933627.6A CN202410933627A CN118468899A CN 118468899 A CN118468899 A CN 118468899A CN 202410933627 A CN202410933627 A CN 202410933627A CN 118468899 A CN118468899 A CN 118468899A
Authority
CN
China
Prior art keywords
translation
language model
examples
perception
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410933627.6A
Other languages
Chinese (zh)
Other versions
CN118468899B (en
Inventor
刘学博
李辰
张梅山
张民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Original Assignee
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology filed Critical Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority to CN202410933627.6A priority Critical patent/CN118468899B/en
Publication of CN118468899A publication Critical patent/CN118468899A/en
Application granted granted Critical
Publication of CN118468899B publication Critical patent/CN118468899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

本发明涉及机器翻译领域,特别是指一种基于示例感知的机器翻译大语言模型的翻译方法及装置,方法包括:构造句子级感知示例以及文档级感知示例;根据句子级感知示例、文档级感知示例构造的示例感知训练数据,根据训练数据以及低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型;构造领域翻译示例以及文档级翻译示例;根据领域翻译示例以及文档级翻译示例,对训练好的机器翻译大语言模型进行优化,得到构建好的示例感知能力提升的机器翻译大语言模型,进而得到翻译结果。本发明不仅能够为特定领域翻译和文档级翻译等特定需求提供定制化解决方案,还能够在不牺牲翻译质量的前提下,显著提高翻译效率和性能。

The present invention relates to the field of machine translation, and in particular to a translation method and device of a machine translation large language model based on example perception, the method comprising: constructing sentence-level perception examples and document-level perception examples; constructing example-perception training data according to sentence-level perception examples and document-level perception examples, and obtaining a trained machine translation large language model according to the training data and low-rank adaptive LoRA fine-tuning technology; constructing domain translation examples and document-level translation examples; optimizing the trained machine translation large language model according to the domain translation examples and document-level translation examples, obtaining a constructed machine translation large language model with improved example perception ability, and then obtaining a translation result. The present invention can not only provide customized solutions for specific needs such as specific domain translation and document-level translation, but also significantly improve translation efficiency and performance without sacrificing translation quality.

Description

基于示例感知的机器翻译大语言模型的翻译方法及装置Translation method and device for example-aware machine translation large language model

技术领域Technical Field

本发明涉及机器翻译技术领域,特别是指一种基于示例感知的机器翻译大语言模型的翻译方法及装置。The present invention relates to the technical field of machine translation, and in particular to a translation method and device for a large language model of machine translation based on example perception.

背景技术Background Art

随着大语言模型技术在学术界和产业界上取得了显著进展,在机器翻译中表现出令人印象深刻的性能,机器翻译大语言模型逐渐成为了自然语言处理领域的一个热门研究主题。在大模型上通过提供相关示例进行上下文学习或者通过翻译数据监督学习的方式对大模型进行微调,使大模型在多种语言下适应翻译任务,展示出了卓越的翻译能力,因此近年来得到了研究者们的广泛关注。As large language model technology has made significant progress in academia and industry, and has shown impressive performance in machine translation, large language models for machine translation have gradually become a hot research topic in the field of natural language processing. By providing relevant examples for contextual learning or fine-tuning the large model through supervised learning with translation data, the large model can be adapted to translation tasks in multiple languages, demonstrating excellent translation capabilities. Therefore, it has received widespread attention from researchers in recent years.

目前机器翻译大语言模型技术包括基于提示的方法和基于微调的方法来使用大语言模型。基于提示的机器翻译大语言模型通过向基础的大语言模型提供示例来激活模型的翻译能力,提供了一种无需训练的方法,可以根据特定翻译需求(如特定领域翻译)进行快速部署。然而这种方法受到固定参数的制约,限制了其适应性和翻译深度。相比之下,基于微调的机器翻译大语言模型通过在机器翻译特定数据集上进行有监督的微调来增强翻译能力,从而通过参数更新提高性能。但是这种方法往往忽视了大语言模型的上下文学习能力,尤其是在一些需要示例感知的翻译场景,仅仅使用简单指令的句子级翻译会限制机器翻译大语言模型的翻译能力。At present, the large language model technology for machine translation includes prompt-based methods and fine-tuning-based methods to use large language models. The prompt-based large language model for machine translation activates the translation ability of the model by providing examples to the basic large language model, providing a training-free method that can be quickly deployed according to specific translation needs (such as domain-specific translation). However, this method is constrained by fixed parameters, which limits its adaptability and translation depth. In contrast, the fine-tuning-based large language model for machine translation enhances translation capabilities through supervised fine-tuning on machine translation-specific datasets, thereby improving performance through parameter updates. However, this method often ignores the contextual learning ability of the large language model, especially in some translation scenarios that require example perception. Only sentence-level translation using simple instructions will limit the translation ability of the large language model for machine translation.

发明内容Summary of the invention

为了解决现有机器翻译中示例感知与学习的不足,如何克服基于提示和基于微调的机器翻译大语言模型的局限性,从而实现更高效、更精准的机器翻译的技术问题,本发明实施例提供了一种基于示例感知的机器翻译大语言模型的翻译方法及装置。所述技术方案如下:In order to solve the shortcomings of example perception and learning in existing machine translation, how to overcome the limitations of the large language model of machine translation based on prompts and fine-tuning, so as to achieve more efficient and more accurate machine translation, the embodiment of the present invention provides a translation method and device of a large language model of machine translation based on example perception. The technical solution is as follows:

一方面,提供了一种基于示例感知的机器翻译大语言模型的翻译方法,该方法由机器翻译大语言模型的翻译设备实现,该方法包括:On the one hand, a translation method of a large language model for machine translation based on example perception is provided, the method is implemented by a translation device for a large language model for machine translation, and the method includes:

S1、获取待翻译的数据。S1. Obtain data to be translated.

S2、将数据输入到构建好的示例感知能力提升的机器翻译大语言模型。S2. Input the data into the constructed large language model for machine translation with improved example perception capabilities.

S3、根据数据以及示例感知能力提升的机器翻译大语言模型,得到翻译结果。S3. Obtain translation results based on a large language model of machine translation with improved data and example perception capabilities.

其中,示例感知能力提升的机器翻译大语言模型的构建过程,包括:The process of building a large language model for machine translation with improved sample perception capabilities includes:

S21、构造句子级感知示例以及文档级感知示例。S21. Construct sentence-level perception examples and document-level perception examples.

S22、根据所述句子级感知示例以及文档级感知示例,构造示例感知训练数据;根据示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型。S22. Construct example-aware training data based on the sentence-level perception examples and the document-level perception examples; and obtain a trained large language model for machine translation through low-rank adaptive LoRA fine-tuning technology based on the example-aware training data.

S23、构造领域翻译示例以及文档级翻译示例。S23. Construction domain translation examples and document level translation examples.

S24、根据领域翻译示例以及文档级翻译示例,对训练好的机器翻译大语言模型进行翻译优化,得到构建好的示例感知能力提升的机器翻译大语言模型。S24. According to the domain translation examples and the document-level translation examples, the trained machine translation large language model is optimized for translation, so as to obtain a constructed machine translation large language model with improved example perception capability.

可选地,S21中的构造句子级感知示例以及文档级感知示例,包括:Optionally, constructing sentence-level perception examples and document-level perception examples in S21 includes:

S211、获取原始训练集。S211. Obtain the original training set.

S212、在原始训练集中,随机选取多个翻译对,将多个翻译对作为句子级感知示例。S212. In the original training set, randomly select multiple translation pairs and use the multiple translation pairs as sentence-level perception examples.

S213、选取原始训练集中的任一翻译对,获取原始训练集中所选取的翻译对的前多个翻译对,将获取的多个翻译对作为文档级感知示例。S213: Select any translation pair in the original training set, obtain the first multiple translation pairs of the selected translation pair in the original training set, and use the obtained multiple translation pairs as document-level perception examples.

可选地,句子级感知示例,如下式(1)所示:Optionally, a sentence-level perception example is shown in the following formula (1):

(1) (1)

式中,表示第个翻译对的句子级感知示例,表示翻译对的数量,表示示例的格式,表示第个翻译对的源句子,表示第个翻译对的目标句子。In the formula, Indicates Example of sentence-level perception of translation pairs, represents the number of translation pairs, The format of the representation example is, Indicates The source sentence of the translation pair, Indicates The target sentence of a translation pair.

可选地,文档级感知示例,如下式(2)所示:Optionally, a document-level perception example is shown in the following formula (2):

(2) (2)

式中,表示文档级感知示例,表示句子在原始训练集中的顺序, 表示翻译对的数量,表示示例的格式,表示在第个句子前面的第个源句子,表示在第个句子前面的第个目标句子。In the formula, represents a document-level aware example, represents the order of sentences in the original training set, represents the number of translation pairs, The format of the representation example is, Indicated in The first sentence before Source sentence, Indicated in The first sentence before target sentence.

可选地,S22中的根据句子级感知示例以及文档级感知示例,构造示例感知训练数据;根据示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型,包括:Optionally, in S22, example-aware training data is constructed according to the sentence-level awareness examples and the document-level awareness examples; and according to the example-aware training data, a trained large language model for machine translation is obtained by low-rank adaptive LoRA fine-tuning technology, including:

S221、将句子级感知示例以及文档级感知示例与原始训练数据拼接,拼接后的训练数据通过伯努利概率进行混合,得到示例感知训练数据。S221. Concatenate the sentence-level perception examples and the document-level perception examples with the original training data, and mix the concatenated training data through Bernoulli probability to obtain example-perception training data.

S222、获取基础的机器翻译大语言模型。S222. Obtain a basic large language model for machine translation.

S223、采用低秩适应LoRA微调技术,对基础的机器翻译大语言模型添加可调参数,得到参数可调的机器翻译大语言模型。S223. Use low-rank adaptive LoRA fine-tuning technology to add adjustable parameters to the basic machine translation large language model to obtain a machine translation large language model with adjustable parameters.

S224、根据示例感知训练数据,对参数可调的机器翻译大语言模型进行训练,得到训练好的机器翻译大语言模型。S224. According to the example perception training data, a large machine translation language model with adjustable parameters is trained to obtain a trained large machine translation language model.

可选地,S223中的参数可调的机器翻译大语言模型的训练损失,如下式(3)所示:Optionally, the training loss of the large language model of machine translation with adjustable parameters in S223 is as shown in the following formula (3):

(3) (3)

式中,表示参数可调的机器翻译大语言模型的训练损失,表示交叉熵损失,表示目标句子的概率分布,表示目标句子,表示翻译的源句子,表示翻译中的指令,表示个翻译对的集合表示基础的机器翻译大语言模型的参数在训练过程中被冻结,表示可调参数。In the formula, represents the training loss of a large language model for machine translation with adjustable parameters, represents the cross entropy loss, Represents the target sentence The probability distribution of represents the target sentence, represents the source sentence of the translation, Indicates the instructions being translated, express A collection of translation pairs , Represents the parameters of the basic machine translation language model is frozen during training, Indicates an adjustable parameter.

可选地,S23中的构造领域翻译示例以及文档级翻译示例,包括:Optionally, the construction domain translation example and the document level translation example in S23 include:

S231、获取原始训练集。S231. Obtain an original training set.

S232、根据R-BM25检索方法,对原始训练集中的翻译对进行打分排序,选取打分超过预设阈值的翻译对作为领域翻译示例。S232. According to the R-BM25 retrieval method, the translation pairs in the original training set are scored and sorted, and the translation pairs with scores exceeding a preset threshold are selected as domain translation examples.

S233、针对测试集中的目标句子,获取目标句子的前多个句子的翻译源语句和目标语句,将前多个句子的翻译源语句和目标语句输入至训练好的机器翻译大语言模型,得到机器翻译大语言模型生成的翻译示例,根据领域翻译示例以及机器翻译大语言模型生成的翻译示例,得到文档级翻译示例。S233. For a target sentence in a test set, obtain the translation source sentences and target sentences of the first multiple sentences of the target sentence, input the translation source sentences and target sentences of the first multiple sentences into a trained machine translation large language model, obtain translation examples generated by the machine translation large language model, and obtain document-level translation examples based on the domain translation examples and the translation examples generated by the machine translation large language model.

另一方面,提供了一种基于示例感知的机器翻译大语言模型的翻译装置,该装置应用于基于示例感知的机器翻译大语言模型的翻译方法,该装置包括:On the other hand, a translation device based on an example-aware machine translation large language model is provided, and the device is applied to a translation method based on an example-aware machine translation large language model, and the device includes:

获取模块,用于获取待翻译的数据。The acquisition module is used to obtain the data to be translated.

输入模块,用于将数据输入到构建好的示例感知能力提升的机器翻译大语言模型。The input module is used to input data into the constructed large language model for machine translation with improved example-awareness.

输出模块,用于根据数据以及示例感知能力提升的机器翻译大语言模型,得到翻译结果。The output module is used to obtain the translation results based on the large language model of machine translation with improved data and example perception capabilities.

其中,示例感知能力提升的机器翻译大语言模型的构建过程,包括:The process of building a large language model for machine translation with improved sample perception capabilities includes:

S21、构造句子级感知示例以及文档级感知示例。S21. Construct sentence-level perception examples and document-level perception examples.

S22、根据句子级感知示例以及文档级感知示例,构造示例感知训练数据;根据示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型。S22. Construct example-aware training data based on sentence-level perception examples and document-level perception examples; based on the example-aware training data, obtain a trained large language model for machine translation through low-rank adaptive LoRA fine-tuning technology.

S23、构造领域翻译示例以及文档级翻译示例。S23. Construction domain translation examples and document level translation examples.

S24、根据领域翻译示例以及文档级翻译示例,对训练好的机器翻译大语言模型进行翻译优化,得到构建好的示例感知能力提升的机器翻译大语言模型。S24. According to the domain translation examples and the document-level translation examples, the trained machine translation large language model is optimized for translation, so as to obtain a constructed machine translation large language model with improved example perception capability.

可选地,输入模块,进一步用于:Optionally, the input module is further configured to:

S211、获取原始训练集。S211. Obtain the original training set.

S212、在原始训练集中,随机选取多个翻译对,将多个翻译对作为句子级感知示例训练数据。S212. Randomly select multiple translation pairs from the original training set, and use the multiple translation pairs as sentence-level perception example training data.

S213、选取原始训练集中的任一翻译对,获取原始训练集中所选取的翻译对的前多个翻译对,将获取的多个翻译对作为文档级感知示例训练数据。S213: Select any translation pair in the original training set, obtain the first multiple translation pairs of the selected translation pair in the original training set, and use the obtained multiple translation pairs as document-level perception example training data.

可选地,句子级感知示例,如下式(1)所示:Optionally, a sentence-level perception example is shown in the following formula (1):

(1) (1)

式中,表示第个翻译对的句子级感知示例,表示翻译对的数量,表示示例的格式,表示第个翻译对的源句子,表示第个翻译对的目标句子。In the formula, Indicates Example of sentence-level perception of translation pairs, represents the number of translation pairs, The format of the representation example is, Indicates The source sentence of the translation pair, Indicates The target sentence of a translation pair.

可选地,文档级感知示例,如下式(2)所示:Optionally, a document-level perception example is shown in the following formula (2):

(2) (2)

式中,表示文档级感知示例,表示句子在原始训练集中的顺序, 表示翻译对的数量,表示示例的格式,表示在第个句子前面的第个源句子,表示在第个句子前面的第个目标句子。In the formula, represents a document-level aware example, represents the order of sentences in the original training set, represents the number of translation pairs, The format of the representation example is, Indicated in The first sentence before Source sentence, Indicated in The first sentence before target sentence.

可选地,输入模块,进一步用于:Optionally, the input module is further configured to:

S221、将句子级感知示例以及文档级感知示例与原始训练数据拼接,拼接后的训练数据通过伯努利概率进行混合,得到示例感知训练数据。S221. Concatenate the sentence-level perception examples and the document-level perception examples with the original training data, and mix the concatenated training data through Bernoulli probability to obtain example-perception training data.

S222、获取基础的机器翻译大语言模型。S222. Obtain a basic large language model for machine translation.

S223、采用低秩适应LoRA微调技术,对基础的机器翻译大语言模型添加可调参数,得到参数可调的机器翻译大语言模型。S223. Use low-rank adaptive LoRA fine-tuning technology to add adjustable parameters to the basic machine translation large language model to obtain a machine translation large language model with adjustable parameters.

S224、根据示例感知训练数据,对参数可调的机器翻译大语言模型进行训练,得到训练好的机器翻译大语言模型。S224. According to the example perception training data, a large machine translation language model with adjustable parameters is trained to obtain a trained large machine translation language model.

可选地,参数可调的机器翻译大语言模型的训练损失,如下式(3)所示:Optionally, the training loss of the large language model of machine translation with adjustable parameters is as shown in the following formula (3):

(3) (3)

式中,表示参数可调的机器翻译大语言模型的训练损失,表示交叉熵损失,表示目标句子的概率分布,表示目标句子,表示翻译的源句子,表示翻译中的指令,表示个翻译对的集合表示基础的机器翻译大语言模型的参数在训练过程中被冻结,表示可调参数。In the formula, represents the training loss of a large language model for machine translation with adjustable parameters, represents the cross entropy loss, Represents the target sentence The probability distribution of represents the target sentence, represents the source sentence of the translation, Indicates the instructions being translated, express A collection of translation pairs , Represents the parameters of the basic machine translation large language model is frozen during training, Indicates an adjustable parameter.

可选地,输入模块,进一步用于:Optionally, the input module is further configured to:

S231、获取原始训练集。S231. Obtain an original training set.

S232、根据R-BM25检索方法,对原始训练集中的翻译对进行打分排序,选取打分超过预设阈值的翻译对作为领域翻译示例。S232. According to the R-BM25 retrieval method, the translation pairs in the original training set are scored and sorted, and the translation pairs with scores exceeding a preset threshold are selected as domain translation examples.

S233、针对测试集中的目标句子,获取目标句子的前多个句子的翻译源语句和目标语句,将前多个句子的翻译源语句和目标语句输入至训练好的机器翻译大语言模型,得到机器翻译大语言模型生成的翻译示例,根据领域翻译示例以及机器翻译大语言模型生成的翻译示例,得到文档级翻译示例。S233. For a target sentence in a test set, obtain the translation source sentences and target sentences of the first multiple sentences of the target sentence, input the translation source sentences and target sentences of the first multiple sentences into a trained machine translation large language model, obtain translation examples generated by the machine translation large language model, and obtain document-level translation examples based on the domain translation examples and the translation examples generated by the machine translation large language model.

另一方面,提供一种机器翻译大语言模型的翻译设备,所述机器翻译大语言模型的翻译设备包括:处理器;存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现如上述基于示例感知提升的机器翻译大语言模型的翻译方法中的任一项方法。On the other hand, a translation device for a large language model for machine translation is provided, the translation device for a large language model for machine translation comprising: a processor; a memory, the memory storing computer-readable instructions, and when the computer-readable instructions are executed by the processor, any one of the translation methods for a large language model for machine translation based on example-perceptual enhancement is implemented.

另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现上述基于示例感知提升的机器翻译大语言模型的翻译方法中的任一项方法。On the other hand, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement any of the above-mentioned translation methods of the large language model of machine translation based on example-aware boosting.

本发明实施例提供的技术方案带来的有益效果至少包括:The beneficial effects brought about by the technical solution provided by the embodiment of the present invention include at least:

本发明实施例中,通过轻量级的示例感知微调方法和示例筛选混合策略,有效整合了基于提示和基于微调的方法,提升了翻译模型在各种翻译场景中的适应性和性能。此外,本发明特别关注于利用示例来促进稀有词汇的翻译,同时提出策略以减轻噪声示范可能对翻译质量造成的负面影响。通过这种方法,本发明不仅能够为特定领域翻译和文档级翻译等特定需求提供定制化解决方案,还能够在不牺牲翻译质量的前提下,显著提高翻译效率和性能。In the embodiment of the present invention, a lightweight example-aware fine-tuning method and an example screening hybrid strategy are used to effectively integrate the prompt-based and fine-tuning methods, thereby improving the adaptability and performance of the translation model in various translation scenarios. In addition, the present invention pays special attention to using examples to promote the translation of rare words, while proposing strategies to mitigate the negative impact that noise demonstrations may have on translation quality. In this way, the present invention can not only provide customized solutions for specific needs such as translation in specific fields and document-level translation, but also significantly improve translation efficiency and performance without sacrificing translation quality.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1是本发明实施例提供的一种基于示例感知的机器翻译大语言模型的翻译方法流程图;FIG1 is a flow chart of a translation method of a large language model for machine translation based on example perception provided by an embodiment of the present invention;

图2是本发明实施例提供的一种基于示例感知的机器翻译大语言模型的翻译方法方法的总体流程图;FIG2 is an overall flow chart of a translation method of a large language model of machine translation based on example perception provided by an embodiment of the present invention;

图3是本发明实施例提供的示例感知训练方法流程图;FIG3 is a flow chart of an example perception training method provided by an embodiment of the present invention;

图4是本发明实施例提供的示例感知推理方法流程图;FIG4 is a flow chart of an example perceptual reasoning method provided by an embodiment of the present invention;

图5是本发明实施例提供的一种基于示例感知的机器翻译大语言模型的翻译装置框图;FIG5 is a block diagram of a translation device of a large language model of machine translation based on example perception provided by an embodiment of the present invention;

图6是本发明实施例提供的一种机器翻译大语言模型的翻译设备的结构示意图。FIG6 is a schematic diagram of the structure of a translation device for a large language model for machine translation provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图,对本发明中的技术方案进行描述。The technical solution of the present invention is described below in conjunction with the accompanying drawings.

在本发明实施例中,“示例地”、“例如”等词用于表示作例子、例证或说明。本发明中被描述为“示例”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用示例的一词旨在以具体方式呈现概念。此外,在本发明实施例中,“和/或”所表达的含义可以是两者都有,或者可以是两者任选其一。In the embodiments of the present invention, words such as "exemplarily" and "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "example" in the present invention should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of the word "example" is intended to present the concept in a specific way. In addition, in the embodiments of the present invention, the meaning expressed by "and/or" can be both, or it can be either of the two.

本发明实施例中,“图像”,“图片”有时可以混用,应当指出的是,在不强调其区别时,其所要表达的含义是一致的。“的(of)”,“相应的(corresponding,relevant)”和“对应的(corresponding)”有时可以混用,应当指出的是,在不强调其区别时,其所要表达的含义是一致的。In the embodiments of the present invention, "image" and "picture" can sometimes be used interchangeably. It should be noted that when the difference between them is not emphasized, the meanings they intend to express are the same. "of", "corresponding, relevant" and "corresponding" can sometimes be used interchangeably. It should be noted that when the difference between them is not emphasized, the meanings they intend to express are the same.

本发明实施例中,有时候下标如W1可能会写为非下标的形式如W1,在不强调其区别时,其所要表达的含义是一致的。In the embodiments of the present invention, sometimes a subscript such as W1 may be written as a non-subscript such as W1. When the difference is not emphasized, the meanings to be expressed are the same.

为使本发明要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention more clear, a detailed description will be given below with reference to the accompanying drawings and specific embodiments.

本发明实施例提供了一种基于示例感知的机器翻译大语言模型的翻译方法,该方法可以由机器翻译大语言模型的翻译设备实现,该机器翻译大语言模型的翻译设备可以是终端或服务器。如图1所示的基于示例感知的机器翻译大语言模型的翻译方法流程图,该方法的处理流程可以包括如下的步骤:The embodiment of the present invention provides a translation method of a large language model for machine translation based on example perception, which can be implemented by a translation device of a large language model for machine translation, which can be a terminal or a server. As shown in FIG1 , the flow chart of the translation method of a large language model for machine translation based on example perception is shown in FIG1 . The processing flow of the method may include the following steps:

S1、获取待翻译的数据。S1. Obtain data to be translated.

S2、将数据输入到构建好的示例感知能力提升的机器翻译大语言模型。S2. Input the data into the constructed large language model for machine translation with improved example perception capabilities.

S3、根据数据以及示例感知能力提升的机器翻译大语言模型,得到翻译结果。S3. Obtain translation results based on a large language model of machine translation with improved data and example perception capabilities.

其中,如图2所示,示例感知能力提升的机器翻译大语言模型的构建过程,可以包括如下步骤S21- S24:As shown in FIG. 2 , the process of constructing a large language model for machine translation with improved perception capability may include the following steps S21-S24:

S21、构造句子级感知示例以及文档级感知示例。S21. Construct sentence-level perception examples and document-level perception examples.

可选地,上述步骤S21可以包括如下步骤S211- S213:Optionally, the above step S21 may include the following steps S211-S213:

S211、获取原始训练集。S211. Obtain the original training set.

S212、在原始训练集中,随机选取多个翻译对,将多个翻译对作为句子级感知示例。S212. In the original training set, randomly select multiple translation pairs and use the multiple translation pairs as sentence-level perception examples.

一种可行的实施方式中,利用机器翻译大语言模型的原始训练数据都是同一领域的特性,进行随机选取。In a feasible implementation, the original training data of the large language model for machine translation are randomly selected by taking advantage of the fact that they are all from the same field.

具体地,在通常的上下文学习框架中,模型通过选取个样例作为感知示例来翻译目标句子,目标句子的概率分布如下:Specifically, in the usual contextual learning framework, the model selects Examples as perception examples To translate the target sentence, the target sentence The probability distribution of is as follows:

(1) (1)

式中,表示目标句子的概率分布,表示目标句子的长度。In the formula, Represents the target sentence The probability distribution of Indicates the length of the target sentence.

通过利用学习到的知识和上下文信息,模型能够生成更加准确和自然的翻译结果。By leveraging learned knowledge and contextual information, the model is able to generate more accurate and natural translation results.

图3展示了示例感知训练的具体流程。对于句子级的翻译,考虑到原始的训练数据都是在同一个领域的内容,本发明在原始训练集中随机选取个翻译对,作为本发明的句子级感知示例,具体结构如下:FIG3 shows the specific process of example perception training. For sentence-level translation, considering that the original training data are all in the same field, the present invention randomly selects A translation pair is used as a sentence-level perception example of the present invention, and the specific structure is as follows:

(2) (2)

式中,表示第个翻译对的句子级感知示例,表示翻译对的数量,表示示例的格式,表示第个翻译对的源句子,表示第个翻译对的目标句子,“”表示示例之间的分隔符。In the formula, Indicates Example of sentence-level perception of translation pairs, represents the number of translation pairs, The format of the representation example is, Indicates The source sentence of the translation pair, Indicates The target sentence of the translation pair, " indicates the separator between examples.

因此,翻译的概率是。在训练过程中,句子级的示例是在整个训练数据中选取的。Therefore, the probability of translation is During the training process, sentence-level examples are selected from the entire training data.

S213、选取原始训练集中的任一翻译对,获取原始训练集中所选取的翻译对的前多个翻译对,将获取的多个翻译对作为文档级感知示例。S213: Select any translation pair in the original training set, obtain the first multiple translation pairs of the selected translation pair in the original training set, and use the obtained multiple translation pairs as document-level perception examples.

一种可行的实施方式中,利用机器翻译大语言模型的原始训练数据都是有逻辑的段落的特性,进行文档级选取。In a feasible implementation, document-level selection is performed by utilizing the fact that the original training data of the large language model of machine translation are all logical paragraphs.

具体地,对于文档级的翻译,考虑到在模型的上下文学习过程中,理解上下文中的逻辑关系至关重要,而训练集通常由连续的段落而非不相关的句子组成。为此,本发明为原始训练集中的每对平行训练数据,随机的选取在他前面紧挨着他的到个示例,作为本发明的文档级感知示例,具体结构如下:Specifically, for document-level translation, considering that in the context learning process of the model, it is crucial to understand the logical relationship in the context, and the training set usually consists of continuous paragraphs rather than unrelated sentences. To this end, the present invention randomly selects the pair of parallel training data in front of it and next to it in the original training set. An example is given as a document-level perception example of the present invention, and the specific structure is as follows:

(3) (3)

式中,表示文档级感知示例,表示句子在原始训练集中的顺序,表示翻译对的数量,表示示例的格式,表示在第个句子前面的第个源句子,表示在第个句子前面的第个目标句子。In the formula, represents a document-level aware example, represents the order of sentences in the original training set, represents the number of translation pairs, The format of the representation example is, Indicated in The first sentence before Source sentence, Indicated in The first sentence before target sentence.

本发明的各种示例样式如表1所示:Various exemplary forms of the present invention are shown in Table 1:

表1Table 1

S22、根据句子级感知示例以及文档级感知示例,构造示例感知训练数据;根据示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型。S22. Construct example-aware training data based on sentence-level perception examples and document-level perception examples; based on the example-aware training data, obtain a trained large language model for machine translation through low-rank adaptive LoRA fine-tuning technology.

一种可行的实施方式中,将构造好的句子级和文档级训练数据进行混合,使用LoRA(Low-Rank Adaptation,低秩适应)方法,进行快速高效的训练。In a feasible implementation, the constructed sentence-level and document-level training data are mixed, and the LoRA (Low-Rank Adaptation) method is used for fast and efficient training.

可选地,上述步骤S22可以包括如下步骤S221- S224:Optionally, the above step S22 may include the following steps S221-S224:

S221、将句子级感知示例以及文档级感知示例与原始训练数据拼接,拼接后的训练数据通过伯努利概率进行混合,得到示例感知训练数据。S221. Concatenate the sentence-level perception examples and the document-level perception examples with the original training data, and mix the concatenated training data through Bernoulli probability to obtain example-perception training data.

一种可行的实施方式中,在构造好句子级感知示例和文档级感知示例后,考虑到机器翻译大语言模型的面临的不同应用场景,本发明通过伯努利分布随机选择使用的示例类型来进行混合,这种混合使用方法提高了模型在不同翻译场景下的灵活性和效率,具体的分布如下:In a feasible implementation, after constructing sentence-level perception examples and document-level perception examples, considering the different application scenarios faced by the large language model of machine translation, the present invention randomly selects the example types to be used through Bernoulli distribution for mixing. This mixed use method improves the flexibility and efficiency of the model in different translation scenarios. The specific distribution is as follows:

(4) (4)

其中,为在训练过程中选择句子级演示的概率。而选择文档级演示的概率的概率为in, To select sentence-level demonstrations during training The probability of selecting document-level demonstration The probability of .

S222、获取基础的机器翻译大语言模型。S222. Obtain a basic large language model for machine translation.

S223、采用低秩适应LoRA微调技术,对基础的机器翻译大语言模型添加可调参数,得到参数可调的机器翻译大语言模型。S223. Use low-rank adaptive LoRA fine-tuning technology to add adjustable parameters to the basic machine translation large language model to obtain a machine translation large language model with adjustable parameters.

一种可行的实施方式中,为了进一步优化模型性能,本发明引入了轻量级LoRA微调技术。通过在模型中添加少量可调参数,在保证模型效率的同时,提升了模型的学习能力和适应性。这种方法既能够充分利用预训练模型的能力,又能针对特定任务进行有效的微调。给定基本模型的参数,本发明在模型中添加了额外的可调参数。因此,训练样本的训练损失为:In a feasible implementation, in order to further optimize the model performance, the present invention introduces lightweight LoRA fine-tuning technology. By adding a small number of adjustable parameters to the model, the learning ability and adaptability of the model are improved while ensuring the efficiency of the model. This method can fully utilize the capabilities of the pre-trained model and effectively fine-tune it for specific tasks. Given the parameters of the basic model , the present invention adds additional adjustable parameters to the model Therefore, the training samples The training loss is:

(5) (5)

式中,表示参数可调的大型翻译模型的训练损失,表示交叉熵损失,表示目标句子的概率分布,表示目标句子,表示翻译的源句子,表示翻译中的指令,等价,表示参数在训练过程中被冻结。In the formula, represents the training loss of a large translation model with adjustable parameters, represents the cross entropy loss, Represents the target sentence The probability distribution of represents the target sentence, represents the source sentence of the translation, Indicates the instructions being translated, and equivalence, Representation parameters Frozen during training.

由于本发明使用了轻量级的LoRA微调,因此之间的参数大小的比较应该满足:Since the present invention uses lightweight LoRA fine-tuning, and The comparison of the parameter sizes should satisfy:

(6) (6)

S224、根据示例感知训练数据,对参数可调的机器翻译大语言模型进行训练,得到训练好的机器翻译大语言模型。S224. According to the example perception training data, a large machine translation language model with adjustable parameters is trained to obtain a trained large machine translation language model.

S23、构造领域翻译示例以及文档级翻译示例。S23. Construction domain translation examples and document level translation examples.

可选地,上述步骤S23可以包括如下步骤S231- S233:Optionally, the above step S23 may include the following steps S231-S233:

S231、获取原始训练集。S231. Obtain an original training set.

S232、根据R-BM25检索方法,对原始训练集中的翻译对进行打分排序,选取打分超过预设阈值的翻译对作为领域翻译示例。S232. According to the R-BM25 retrieval method, the translation pairs in the original training set are scored and sorted, and the translation pairs with scores exceeding a preset threshold are selected as domain translation examples.

一种可行的实施方式中,如图4所示,利用R-BM25方法对示例进行打分排序,本发明选取分数最高的几个示例作为本发明推理时使用的高质量示例作为示例进行推理。In a feasible implementation, as shown in FIG4 , the examples are scored and ranked using the R-BM25 method, and the present invention selects several examples with the highest scores as high-quality examples used in the reasoning of the present invention as examples for reasoning.

其中,R-BM25检索方法:本发明首先使用常规的BM25进行示例的筛选,之后从检索到的BM25示例的翻译源语句和测试集目标句子中提取所有单词ngram及其计数,让S和Q分别表示来自BM25检索示例的翻译源语句ngram和测试集目标句子ngram的集合,根据ngram的匹配程度进行重新排名,使用以下公式实现R-BM25分数的计算:Among them, the R-BM25 retrieval method: the present invention first uses conventional BM25 to screen examples, then extracts all word ngrams and their counts from the translated source sentences of the retrieved BM25 examples and the target sentences of the test set, let S and Q represent the sets of translated source sentence ngrams and target sentence ngrams from the BM25 retrieval examples and the test set, respectively, re-rank according to the matching degree of ngrams, and use the following formula to calculate the R-BM25 score:

(7) (7)

(8) (8)

式中,表示ngram的匹配率,表示计算ngram的数量,表示R-BM25分数。In the formula, Indicates the matching rate of ngram, Indicates the number of ngrams to be calculated. Indicates the R-BM25 score.

具体地,在领域翻译优化方面,本发明通过结合R-BM25(Best Match25)检索方法和增量学习框架,显著提升了机器翻译模型在特定领域内的翻译性能。通过精确挑选与待翻译句子领域相似度高的翻译对作为模型输入演示,本发明确保了模型能够接触到与目标领域高度相关的语境和表达方式。这种基于N-Gram(语言模型)匹配和语言相似度计算的演示选择方法,不仅加深了模型对特定领域的理解,还增强了其翻译的准确性和自然性。通过这种方法,模型能够动态地从庞大数据集中筛选出最相关的示例,从而在翻译过程中提供丰富且精确的上下文信息,极大地提高了模型在多样化领域翻译任务中的适应性和表现。本发明使用基于检索的示例来进行领域翻译,满足方程。通过检索评分函数和检索语料库,检索到的R-BM25示例为:Specifically, in terms of domain translation optimization, the present invention significantly improves the translation performance of the machine translation model in a specific domain by combining the R-BM25 (Best Match25) retrieval method and the incremental learning framework. By accurately selecting translation pairs with high domain similarity to the sentences to be translated as model input demonstrations, the present invention ensures that the model can access contexts and expressions that are highly relevant to the target domain. This demonstration selection method based on N-Gram (language model) matching and language similarity calculation not only deepens the model's understanding of specific domains, but also enhances the accuracy and naturalness of its translation. In this way, the model can dynamically filter out the most relevant examples from a large data set, thereby providing rich and accurate contextual information during the translation process, greatly improving the adaptability and performance of the model in translation tasks in diverse domains. The present invention uses retrieval-based examples for domain translation, satisfying the equation . By retrieving the scoring function and search corpus , the retrieved R-BM25 example is:

(9) (9)

式中,表示筛选出的示例,表示选取个分数最高的示例。In the formula, Indicates the filtered examples, Indicates selection The examples with the highest scores.

S233、针对测试集中的目标句子,获取目标句子的前多个句子的翻译源语句和目标语句,将前多个句子的翻译源语句和目标语句输入至训练好的机器翻译大语言模型,得到机器翻译大语言模型生成的翻译示例,根据领域翻译示例以及机器翻译大语言模型生成的翻译示例,得到文档级翻译示例。S233. For a target sentence in a test set, obtain the translation source sentences and target sentences of the first multiple sentences of the target sentence, input the translation source sentences and target sentences of the first multiple sentences into a trained machine translation large language model, obtain translation examples generated by the machine translation large language model, and obtain document-level translation examples based on the domain translation examples and the translation examples generated by the machine translation large language model.

一种可行的实施方式中,综合利用R-BM25方法筛选出的示例和机器翻译大语言模型自我生成的有逻辑的前置翻译内容作为示例进行推理。In a feasible implementation, examples selected by the R-BM25 method and logical pre-translation content generated by the large language model of machine translation are used as examples for reasoning.

其中,目标句子为测试集中的每条数据,对于每条测试数据,把他前面相邻的已经翻译好的多个训练数据作为示例,由于测试集是一段有逻辑的文字,每句话和他前面的几句话也有关联,所以把相邻的已经翻译好的数据作为翻译示例。The target sentence is each piece of data in the test set. For each piece of test data, the multiple adjacent translated training data in front of it are used as examples. Since the test set is a logical text, each sentence is related to the previous sentences, so the adjacent translated data are used as translation examples.

具体地,在文档级翻译优化方面,本发明采用了一种创新的在线适配方法和混合演示策略,有效地利用了文档内部的上下文关系来提升翻译质量。通过实时地选取目标句子前面临近的句子和模型自我生成的翻译作为示例,以及结合通过R-BM25检索到的相关领域示例,本发明的文档级翻译优化策略,通过综合利用模型自我生成的直接上下文信息和外部检索到的领域相关信息,加强了模型对整个文档语境的捕捉和对语句相关知识的学习,还提高了其对文档风格和主题的适应性。为了进一步利用同一文档的内在上下文信息,本发明在翻译一个句子时使用了之前模型自身翻译出的句子。使用K-shot文档级演示,第的翻译的生成方式如下:Specifically, in terms of document-level translation optimization, the present invention adopts an innovative online adaptation method and hybrid demonstration strategy, which effectively utilizes the contextual relationship within the document to improve translation quality. By selecting sentences adjacent to the target sentence and the translations generated by the model itself in real time as examples, and combining them with relevant field examples retrieved through R-BM25, the document-level translation optimization strategy of the present invention, by comprehensively utilizing the direct contextual information generated by the model itself and the field-related information retrieved externally, enhances the model's capture of the entire document context and learning of sentence-related knowledge, and also improves its adaptability to document style and topic. In order to further utilize the intrinsic contextual information of the same document, the present invention uses the sentences previously translated by the model itself when translating a sentence. Using K-shot document-level demonstration , sentence Translation of The generation method is as follows:

(10) (10)

进一步地,为了确保在文档级翻译中具有特定信息(如领域、风格或主题)的翻译质量,本发明提出了使用基于检索的演示和文档级的演示的混合示例。因此,最终的混合示例解码准则是:Furthermore, in order to ensure the translation quality with specific information (such as domain, style or theme) in document-level translation, the present invention proposes to use retrieval-based demonstration and document-level presentation Therefore, the final mixed example decoding criterion is:

(11) (11)

其中,分别表示句子级和文档级示例的数量。对于示例总数,有,式中的表示拼接。in, and Represents the number of sentence-level and document-level examples respectively. For the total number of examples ,have , where Indicates splicing.

S24、根据领域翻译示例以及文档级翻译示例,对训练好的机器翻译大语言模型进行优化,得到构建好的示例感知能力提升的机器翻译大语言模型。S24. According to the domain translation examples and the document-level translation examples, the trained machine translation large language model is optimized to obtain a constructed machine translation large language model with improved example perception capability.

表2Table 2

表2展示了基于提示的机器翻译大语言模型、基于微调的机器翻译大语言模型、原始的机器翻译大语言模型和本发明的基于示例感知提升的机器翻译大语言模型的COMET和d-BLEU分数。本发明选取基于微调的机器翻译大语言模型中性能最好的模型作为原始的机器翻译大语言模型,表示本发明使用的基座模型,使用示例的机器翻译大语言模型表示原始的机器翻译大语言模型使用本发明筛选的翻译示例得到的效果,示例感知的机器翻译大语言模型表示使用本发明的方法进行训练并使用本发明筛选的翻译示例得到的效果,7B和13B表示模型的参数量。根据表2的主要结果,本发明发现原始的基于微调的机器翻译大语言模型的性能可能会因示例而受损,而本发明提出的基于示例感知提升的机器翻译大语言模型能够显著且稳定地从示例中学习,在所有数据集上的结果都优于基线,显示了基于示例感知提升的机器翻译大语言模型带来的上下文学习能力的显著提升。最后,基于示例感知提升的机器翻译大语言模型在不同模型大小下均能增强示例感知能力,在所有数据集上都超过了原始的机器翻译大语言模型,展示了基于示例感知提升的机器翻译大语言模型在不同模型大小上的稳定优越性。Table 2 shows the COMET and d-BLEU scores of the prompt-based machine translation large language model, the fine-tuning-based machine translation large language model, the original machine translation large language model, and the example-aware improvement-based machine translation large language model of the present invention. The present invention selects the best-performing model among the fine-tuning-based machine translation large language models as the original machine translation large language model, represents the base model used by the present invention, the example-based machine translation large language model represents the effect obtained by the original machine translation large language model using the translation examples screened by the present invention, the example-aware machine translation large language model represents the effect obtained by training using the method of the present invention and using the translation examples screened by the present invention, and 7B and 13B represent the parameter amounts of the model. According to the main results of Table 2, the present invention finds that the performance of the original fine-tuning-based machine translation large language model may be impaired by examples, while the example-aware improvement-based machine translation large language model proposed by the present invention can learn from examples significantly and stably, and the results on all data sets are better than the baseline, showing the significant improvement of the context learning ability brought by the example-aware improvement-based machine translation large language model. Finally, the large language model for machine translation based on example-aware improvement can enhance example-awareness under different model sizes and surpasses the original large language model for machine translation on all datasets, demonstrating the stable superiority of the large language model for machine translation based on example-aware improvement on different model sizes.

本发明实施例中,通过轻量级的示例感知微调方法和示例筛选混合策略,有效整合了基于提示和基于微调的方法,提升了翻译模型在各种翻译场景中的适应性和性能。此外,本发明特别关注于利用示例来促进稀有词汇的翻译,同时提出策略以减轻噪声示范可能对翻译质量造成的负面影响。通过这种方法,本发明不仅能够为特定领域翻译和文档级翻译等特定需求提供定制化解决方案,还能够在不牺牲翻译质量的前提下,显著提高翻译效率和性能。In the embodiment of the present invention, a lightweight example-aware fine-tuning method and an example screening hybrid strategy are used to effectively integrate the prompt-based and fine-tuning methods, thereby improving the adaptability and performance of the translation model in various translation scenarios. In addition, the present invention pays special attention to using examples to promote the translation of rare words, while proposing strategies to mitigate the negative impact that noise demonstrations may have on translation quality. In this way, the present invention can not only provide customized solutions for specific needs such as translation in specific fields and document-level translation, but also significantly improve translation efficiency and performance without sacrificing translation quality.

图5是根据一示例性实施例示出的一种基于示例感知的机器翻译大语言模型的翻译装置框图,该装置用于基于示例感知的机器翻译大语言模型的翻译方法。参照图5,该装置包括获取模块310、输入模块320以及输出模块330。其中:FIG5 is a block diagram of a translation device based on an example-aware machine translation large language model according to an exemplary embodiment, and the device is used for a translation method based on an example-aware machine translation large language model. Referring to FIG5 , the device includes an acquisition module 310, an input module 320, and an output module 330. Among them:

获取模块310,用于获取待翻译的数据。The acquisition module 310 is used to acquire data to be translated.

输入模块320,用于将数据输入到构建好的示例感知能力提升的机器翻译大语言模型。The input module 320 is used to input data into the constructed large language model of machine translation with improved example perception capability.

输出模块330,用于根据数据以及示例感知能力提升的机器翻译大语言模型,得到翻译结果。The output module 330 is used to obtain a translation result based on the machine translation large language model with improved data and example perception capabilities.

其中,示例感知能力的机器翻译大语言模型的构建过程,包括:The process of constructing a large language model for machine translation with example-aware capabilities includes:

S21、构造句子级感知示例以及文档级感知示例。S21. Construct sentence-level perception examples and document-level perception examples.

S22、根据句子级感知示例以及文档级感知示例,构造示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型。S22. Construct example-aware training data based on sentence-level perception examples and document-level perception examples, and obtain a trained large language model for machine translation through low-rank adaptive LoRA fine-tuning technology.

S23、构造领域翻译示例以及文档级翻译示例。S23. Construction domain translation examples and document level translation examples.

S24、根据句子级感知示例以及文档级感知示例,构造示例感知训练数据;根据示例感知训练数据;根据示例感知训练数据,通过低秩适应LoRA微调技术,得到训练好的机器翻译大语言模型。S24. Construct example-aware training data based on sentence-level perception examples and document-level perception examples; based on the example-aware training data; based on the example-aware training data, obtain a trained large language model for machine translation through low-rank adaptive LoRA fine-tuning technology.

可选地,输入模块320,进一步用于:Optionally, the input module 320 is further configured to:

S211、获取原始训练集。S211. Obtain the original training set.

S212、在原始训练集中,随机选取多个翻译对,将多个翻译对作为句子级感知示例。S212. In the original training set, randomly select multiple translation pairs and use the multiple translation pairs as sentence-level perception examples.

S213、选取原始训练集中的任一翻译对,获取原始训练集中所选取的翻译对的前多个翻译对,将获取的多个翻译对作为文档级感知示例。S213: Select any translation pair in the original training set, obtain the first multiple translation pairs of the selected translation pair in the original training set, and use the obtained multiple translation pairs as document-level perception examples.

可选地,句子级感知示例,如下式(1)所示:Optionally, a sentence-level perception example is shown in the following formula (1):

(1) (1)

式中,表示第个翻译对的句子级感知示例,表示翻译对的数量,表示示例的格式,表示第个翻译对的源句子,表示第个翻译对的目标句子。In the formula, Indicates Example of sentence-level perception of translation pairs, represents the number of translation pairs, The format of the representation example is, Indicates The source sentence of the translation pair, Indicates The target sentence of a translation pair.

可选地,文档级感知示例,如下式(2)所示:Optionally, a document-level perception example is shown in the following formula (2):

(2) (2)

式中,表示文档级感知示例,表示句子在原始训练集中的顺序,表示翻译对的数量,表示示例的格式,表示在第个句子前面的第个源句子,表示在第个句子前面的第个目标句子。In the formula, represents a document-level aware example, represents the order of sentences in the original training set, represents the number of translation pairs, The format of the representation example is, Indicated in The first sentence before Source sentence, Indicated in The first sentence before target sentence.

可选地,输入模块320,进一步用于:Optionally, the input module 320 is further configured to:

S221、将句子级感知示例以及文档级感知示例与原始训练数据拼接,拼接后的训练数据通过伯努利概率进行混合,得到示例感知训练数据。S221. Concatenate the sentence-level perception examples and the document-level perception examples with the original training data, and mix the concatenated training data through Bernoulli probability to obtain example-perception training data.

S222、获取基础的机器翻译大语言模型。S222. Obtain a basic large language model for machine translation.

S223、采用低秩适应LoRA微调技术,对基础的机器翻译大语言模型添加可调参数,得到参数可调的机器翻译大语言模型。S223. Use low-rank adaptive LoRA fine-tuning technology to add adjustable parameters to the basic machine translation large language model to obtain a machine translation large language model with adjustable parameters.

S224、根据示例感知训练数据,对参数可调的机器翻译大语言模型进行训练,得到训练好的机器翻译大语言模型。S224. According to the example perception training data, a large machine translation language model with adjustable parameters is trained to obtain a trained large machine translation language model.

可选地,参数可调的机器翻译大语言模型的训练损失,如下式(3)所示:Optionally, the training loss of the large language model of machine translation with adjustable parameters is as shown in the following formula (3):

(3) (3)

式中,表示参数可调的机器翻译大语言模型的训练损失,表示交叉熵损失,表示目标句子的概率分布,表示目标句子,表示翻译的源句子,表示翻译中的指令,表示个翻译对的集合表示基础的机器翻译大语言模型的参数在训练过程中被冻结,表示可调参数。In the formula, represents the training loss of a large language model for machine translation with adjustable parameters, represents the cross entropy loss, Represents the target sentence The probability distribution of represents the target sentence, represents the source sentence of the translation, Indicates the instructions being translated, express A collection of translation pairs , Represents the parameters of the basic machine translation language model is frozen during training, Indicates an adjustable parameter.

可选地,输入模块320,进一步用于:Optionally, the input module 320 is further configured to:

S231、获取原始训练集。S231. Obtain an original training set.

S232、根据R-BM25检索方法,对原始训练集中的翻译对进行打分排序,选取打分超过预设阈值的翻译对作为领域翻译示例。S232. According to the R-BM25 retrieval method, the translation pairs in the original training set are scored and sorted, and the translation pairs with scores exceeding a preset threshold are selected as domain translation examples.

S233、针对测试集中的目标句子,获取目标句子的前多个句子的翻译源语句和目标语句,将前多个句子的翻译源语句和目标语句输入至训练好的机器翻译大语言模型,得到机器翻译大语言模型生成的翻译示例,根据领域翻译示例以及机器翻译大语言模型生成的翻译示例,得到文档级翻译示例。S233. For a target sentence in a test set, obtain the translation source sentences and target sentences of the first multiple sentences of the target sentence, input the translation source sentences and target sentences of the first multiple sentences into a trained machine translation large language model, obtain translation examples generated by the machine translation large language model, and obtain document-level translation examples based on the domain translation examples and the translation examples generated by the machine translation large language model.

本发明实施例中,通过轻量级的示例感知微调方法和示例筛选混合策略,有效整合了基于提示和基于微调的方法,提升了翻译模型在各种翻译场景中的适应性和性能。此外,本发明特别关注于利用示例来促进稀有词汇的翻译,同时提出策略以减轻噪声示范可能对翻译质量造成的负面影响。通过这种方法,本发明不仅能够为特定领域翻译和文档级翻译等特定需求提供定制化解决方案,还能够在不牺牲翻译质量的前提下,显著提高翻译效率和性能。In the embodiment of the present invention, a lightweight example-aware fine-tuning method and an example screening hybrid strategy are used to effectively integrate the prompt-based and fine-tuning methods, thereby improving the adaptability and performance of the translation model in various translation scenarios. In addition, the present invention pays special attention to using examples to promote the translation of rare words, while proposing strategies to mitigate the negative impact that noise demonstrations may have on translation quality. In this way, the present invention can not only provide customized solutions for specific needs such as translation in specific fields and document-level translation, but also significantly improve translation efficiency and performance without sacrificing translation quality.

图6是本发明实施例提供的一种机器翻译大语言模型的翻译设备的结构示意图,如图6所示,机器翻译大语言模型的翻译设备可以包括上述图5所示的基于示例感知的机器翻译大语言模型的翻译装置。可选地,机器翻译大语言模型的翻译设备410可以包括第一处理器2001。FIG6 is a schematic diagram of the structure of a translation device for a large language model of machine translation provided by an embodiment of the present invention. As shown in FIG6 , the translation device for a large language model of machine translation may include the translation device for the large language model of machine translation based on example perception shown in FIG5 . Optionally, the translation device 410 for a large language model of machine translation may include a first processor 2001.

可选地,机器翻译大语言模型的翻译设备410还可以包括存储器2002和收发器2003。Optionally, the translation device 410 for machine translation of a large language model may further include a memory 2002 and a transceiver 2003 .

其中,第一处理器2001与存储器2002以及收发器2003,如可以通过通信总线连接。The first processor 2001, the memory 2002 and the transceiver 2003 may be connected via a communication bus.

下面结合图6对机器翻译大语言模型的翻译设备410的各个构成部件进行具体的介绍:The following is a detailed introduction to the various components of the translation device 410 of the machine translation large language model in conjunction with FIG. 6 :

其中,第一处理器2001是机器翻译大语言模型的翻译设备410的控制中心,可以是一个处理器,也可以是多个处理元件的统称。例如,第一处理器2001是一个或多个中央处理器(central processing unit,CPU),也可以是特定集成电路(application specificintegrated circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路,例如:一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)。The first processor 2001 is the control center of the translation device 410 of the machine translation large language model, and can be a processor or a general term for multiple processing elements. For example, the first processor 2001 is one or more central processing units (CPUs), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention, such as one or more microprocessors (digital signal processors, DSPs), or one or more field programmable gate arrays (field programmable gate arrays, FPGAs).

可选地,第一处理器2001可以通过运行或执行存储在存储器2002内的软件程序,以及调用存储在存储器2002内的数据,执行机器翻译大语言模型的翻译设备410的各种功能。Optionally, the first processor 2001 may perform various functions of the translation device 410 for machine translating a large language model by running or executing a software program stored in the memory 2002 and calling data stored in the memory 2002 .

在具体的实现中,作为一种实施例,第一处理器2001可以包括一个或多个CPU,例如图6中所示出的CPU0和CPU1。In a specific implementation, as an embodiment, the first processor 2001 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 6 .

在具体实现中,作为一种实施例,机器翻译大语言模型的翻译设备410也可以包括多个处理器,例如图6中所示的第一处理器2001和第二处理器2004。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the translation device 410 of the machine translation large language model may also include multiple processors, such as the first processor 2001 and the second processor 2004 shown in FIG6 . Each of these processors may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (such as computer program instructions).

其中,所述存储器2002用于存储执行本发明方案的软件程序,并由第一处理器2001来控制执行,具体实现方式可以参考上述方法实施例,此处不再赘述。The memory 2002 is used to store the software program for executing the solution of the present invention, and is controlled to be executed by the first processor 2001. The specific implementation method can refer to the above method embodiment, which will not be repeated here.

可选地,存储器2002可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compactdisc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器2002可以和第一处理器2001集成在一起,也可以独立存在,并通过机器翻译大语言模型的翻译设备410的接口电路(图6中未示出)与第一处理器2001耦合,本发明实施例对此不作具体限定。Optionally, the memory 2002 may be a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, a random access memory (RAM) or other types of dynamic storage devices capable of storing information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium capable of carrying or storing the desired program code in the form of an instruction or data structure and capable of being accessed by a computer, but not limited thereto. The memory 2002 may be integrated with the first processor 2001, or may exist independently, and be coupled to the first processor 2001 through an interface circuit (not shown in FIG. 6 ) of the translation device 410 for the large language model of machine translation, which is not specifically limited in the embodiment of the present invention.

收发器2003,用于与网络设备通信,或者与终端设备通信。The transceiver 2003 is used to communicate with a network device or a terminal device.

可选地,收发器2003可以包括接收器和发送器(图6中未单独示出)。其中,接收器用于实现接收功能,发送器用于实现发送功能。Optionally, the transceiver 2003 may include a receiver and a transmitter (not shown separately in FIG6 ), wherein the receiver is used to implement a receiving function, and the transmitter is used to implement a sending function.

可选地,收发器2003可以和第一处理器2001集成在一起,也可以独立存在,并通过机器翻译大语言模型的翻译设备410的接口电路(图6中未示出)与第一处理器2001耦合,本发明实施例对此不作具体限定。Optionally, the transceiver 2003 can be integrated with the first processor 2001, or can exist independently and be coupled to the first processor 2001 through the interface circuit (not shown in FIG. 6 ) of the translation device 410 of the machine translation large language model, which is not specifically limited in the embodiment of the present invention.

需要说明的是,图6中示出的机器翻译大语言模型的翻译设备410的结构并不构成对该路由器的限定,实际的知识结构识别设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。It should be noted that the structure of the translation device 410 of the machine translation large language model shown in FIG6 does not constitute a limitation on the router, and the actual knowledge structure recognition device may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

此外,机器翻译大语言模型的翻译设备410的技术效果可以参考上述方法实施例所述的基于示例感知的机器翻译大语言模型的翻译方法的技术效果,此处不再赘述。In addition, the technical effects of the translation device 410 for the large language model of machine translation can refer to the technical effects of the translation method for the large language model of machine translation based on example perception described in the above method embodiment, which will not be repeated here.

应理解,在本发明实施例中的第一处理器2001可以是中央处理单元(centralprocessing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digitalsignal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the first processor 2001 in the embodiment of the present invention may be a central processing unit (CPU), and the processor may also be other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.

还应理解,本发明实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random accessmemory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。It should also be understood that the memory in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link DRAM (SLDRAM), and direct rambus RAM (DR RAM).

上述实施例,可以全部或部分地通过软件、硬件(如电路)、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。The above embodiments can be implemented in whole or in part by software, hardware (such as circuits), firmware or any other combination. When implemented by software, the above embodiments can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, the process or function described in the embodiment of the present invention is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (such as infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that contains one or more available media sets. The available medium can be a magnetic medium (for example, a floppy disk, a hard disk, a tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium can be a solid-state hard disk.

应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。It should be understood that the term "and/or" in this article is only a description of the association relationship of associated objects, indicating that there can be three relationships. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. A and B can be singular or plural. In addition, the character "/" in this article generally indicates that the associated objects before and after are in an "or" relationship, but it may also indicate an "and/or" relationship. Please refer to the context for specific understanding.

本发明中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a, b, c, a-b, a-c, b-c, 或a-b-c,其中a,b,c可以是单个,也可以是多个。In the present invention, "at least one" means one or more, and "plurality" means two or more. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single items or plural items. For example, at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or multiple.

应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that in various embodiments of the present invention, the size of the serial numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the above-described equipment, devices and units can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

在本发明所提供的几个实施例中,应该理解到,所揭露的设备、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices, apparatuses and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc., various media that can store program codes.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art who is familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, which should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims (10)

1. A method for translating a machine translation large language model based on example awareness, the method comprising:
S1, acquiring data to be translated;
s2, inputting the data into a constructed machine translation large language model with improved example perceptibility;
s3, according to the data and the machine translation large language model with the improved example perceptibility, a translation result is obtained;
the construction process of the machine translation large language model with the improved example perceptibility comprises the following steps:
S21, constructing sentence-level perception examples and document-level perception examples;
S22, constructing example perception training data according to the sentence-level perception examples and the document-level perception examples; according to the example perception training data, a trained machine translation large language model is obtained through a low-rank adaptation LoRA fine tuning technology;
S23, constructing a domain translation example and a document level translation example;
And S24, according to the domain translation example and the document level translation example, performing translation optimization on the trained machine translation large language model to obtain a constructed machine translation large language model with improved example perceptibility.
2. The method for translating a large language model based on example perception machine according to claim 1, wherein constructing sentence-level perception examples and document-level perception examples in S21 includes:
s211, acquiring an original training set;
s212, randomly selecting a plurality of translation pairs in an original training set, and taking the plurality of translation pairs as sentence-level perception examples;
S213, selecting any translation pair in the original training set, acquiring a plurality of translation pairs before the selected translation pair in the original training set, and taking the acquired translation pairs as document-level perception examples.
3. The method for translating a large language model for example-aware machine translation of claim 2, wherein said sentence-level aware examples are represented by the following formula (1):
(1)
In the method, in the process of the invention, Represent the firstSentence-level perception examples of the individual translation pairs,Representing the number of translation pairs,The format of the examples is shown and,Represent the firstThe source sentences of the individual translation pairs are,Represent the firstTarget sentences of the translation pairs.
4. The method for translating a large language model for example-aware machine translation of claim 2, wherein said document-level aware examples are represented by the following formula (2):
(2)
In the method, in the process of the invention, A document-level perception example is represented,Representing the order of sentences in the original training set,Representing the number of translation pairs,The format of the examples is shown and,Is shown in the firstFirst of every sentenceThe number of source sentences that are to be processed,Is shown in the firstFirst of every sentenceAnd (3) target sentences.
5. The method for translating a large language model for example-aware-based machine translation of claim 1, wherein said step S22 constructs example-aware training data from said sentence-level aware examples and document-level aware examples; according to the example perception training data, a trained machine translation large language model is obtained through a low-rank adaptation LoRA fine tuning technology, which comprises the following steps:
S221, splicing the sentence-level perception examples and the document-level perception examples with original training data, and mixing the spliced training data through Bernoulli probability to obtain example perception training data;
s222, acquiring a basic machine translation large language model;
s223, adding adjustable parameters to the basic machine translation large language model by adopting a low-rank adaptation LoRA fine tuning technology to obtain a machine translation large language model with adjustable parameters;
S224, training the machine translation large language model with the adjustable parameters according to the example perception training data to obtain a trained machine translation large language model.
6. The method for translating a large machine translation language model based on example perception according to claim 5, wherein the training loss of the large machine translation language model with adjustable parameters in S223 is represented by the following formula (3):
(3)
In the method, in the process of the invention, Represents a training penalty for parameter-tunable machine translation large language models,Representing the cross-entropy loss,Representing a target sentenceIs a function of the probability distribution of (1),The target sentence is represented and the target sentence is displayed,The source sentence that is to be translated is represented,Representing an instruction in the translation,Representation ofAggregation of individual translation pairsParameters representing underlying machine translated large language modelsIs frozen during the training process and is then frozen,Representing the adjustable parameter.
7. The method for translating a large language model for example-aware machine translation according to claim 1, wherein the constructing domain translation example and the document level translation example in S23 comprises:
s231, acquiring an original training set;
S232, according to an R-BM25 retrieval method, scoring and sorting are carried out on translation pairs in an original training set, and translation pairs with scores exceeding a preset threshold are selected as field translation examples;
S233, aiming at a target sentence in a test set, acquiring translation source sentences and target sentences of a plurality of sentences in front of the target sentence, inputting the translation source sentences and target sentences of the plurality of sentences in front to a trained machine translation large language model to obtain a translation example generated by the machine translation large language model, and obtaining a document-level translation example according to the field translation example and the translation example generated by the machine translation large language model.
8. A translation apparatus of an example-aware machine translation large language model for implementing the translation method of an example-aware machine translation large language model according to any one of claims 1 to 7, characterized in that the apparatus comprises:
The acquisition module is used for acquiring data to be translated;
The input module is used for inputting the data into the constructed machine translation large language model with improved example perceptibility;
the output module is used for obtaining a translation result according to the data and the machine translation large language model with the improved example perceptibility;
the construction process of the machine translation large language model with the improved example perceptibility comprises the following steps:
S21, constructing sentence-level perception examples and document-level perception examples;
S22, constructing example perception training data according to the sentence-level perception examples and the document-level perception examples; according to the example perception training data, a trained machine translation large language model is obtained through a low-rank adaptation LoRA fine tuning technology;
S23, constructing a domain translation example and a document level translation example;
And S24, optimizing the trained machine translation large language model according to the domain translation example and the document level translation example to obtain a constructed machine translation large language model with improved example perceptibility.
9. A translation apparatus for machine translating a large language model, the translation apparatus for machine translating a large language model comprising:
A processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 7.
10. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method of any one of claims 1 to 7.
CN202410933627.6A 2024-07-12 2024-07-12 Translation method and device of machine translation large language model based on example perception Active CN118468899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410933627.6A CN118468899B (en) 2024-07-12 2024-07-12 Translation method and device of machine translation large language model based on example perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410933627.6A CN118468899B (en) 2024-07-12 2024-07-12 Translation method and device of machine translation large language model based on example perception

Publications (2)

Publication Number Publication Date
CN118468899A true CN118468899A (en) 2024-08-09
CN118468899B CN118468899B (en) 2024-09-24

Family

ID=92150305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410933627.6A Active CN118468899B (en) 2024-07-12 2024-07-12 Translation method and device of machine translation large language model based on example perception

Country Status (1)

Country Link
CN (1) CN118468899B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119721064A (en) * 2024-11-20 2025-03-28 广东工业大学 A machine translation method and system based on large language model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124944A1 (en) * 2014-11-04 2016-05-05 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN114896992A (en) * 2022-04-28 2022-08-12 南京大学 Method, medium and device for improving automatic evaluation of machine translation quality by using retrieval
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 A training method and computer-readable storage medium for machine translation models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124944A1 (en) * 2014-11-04 2016-05-05 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN114896992A (en) * 2022-04-28 2022-08-12 南京大学 Method, medium and device for improving automatic evaluation of machine translation quality by using retrieval
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 A training method and computer-readable storage medium for machine translation models

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张民等: ""文档级神经机器翻译综述"", 《软件学报》, 5 July 2024 (2024-07-05), pages 1 - 16 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119721064A (en) * 2024-11-20 2025-03-28 广东工业大学 A machine translation method and system based on large language model

Also Published As

Publication number Publication date
CN118468899B (en) 2024-09-24

Similar Documents

Publication Publication Date Title
US8694303B2 (en) Systems and methods for tuning parameters in statistical machine translation
US9373075B2 (en) Applying a genetic algorithm to compositional semantics sentiment analysis to improve performance and accelerate domain adaptation
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN110321537B (en) Method and device for generating file
CN118296120A (en) Large-scale language model retrieval enhancement generation method for multi-mode multi-scale multi-channel recall
Wu et al. Research on business English translation framework based on speech recognition and wireless communication
CN112287085B (en) Semantic matching method, system, equipment and storage medium
CN117891927B (en) Question and answer method and device based on large language model, electronic equipment and storage medium
CN112581327B (en) Knowledge graph-based law recommendation method and device and electronic equipment
JP2018073411A (en) Natural language generation method, natural language generation device, and electronic apparatus
CN113342968B (en) Text abstract extraction method and device
CN116611448A (en) Method and device for generating emotional text based on hint learning and masked language model
CN118715523A (en) Generate output sequences with inline evidence using a language model neural network
WO2025044865A1 (en) Cross-domain problem processing methods and apparatuses, electronic device and storage medium
WO2026016339A1 (en) Answer generation method, apparatus and device
CN108804427B (en) Voice machine translation method and device
CN118364059A (en) A knowledge base retrieval method, device, equipment and storage medium
CN118468899B (en) Translation method and device of machine translation large language model based on example perception
TW201225064A (en) Method and system for text to speech conversion
WO2025149813A1 (en) Question-answering method, method and apparatus for training table question-answering model, device, and medium
CN112949293A (en) Similar text generation method, similar text generation device and intelligent equipment
CN117494815A (en) Archive-oriented trusted large language model training, inference methods and devices
JP7161255B2 (en) Document creation support device, document creation support method, and document creation program
CN114861639B (en) Question information generation method and device, electronic equipment and storage medium
CN116595138A (en) Knowledge quiz method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant