CN111738437B

CN111738437B - Training method, text generation device and electronic equipment

Info

Publication number: CN111738437B
Application number: CN202010689980.6A
Authority: CN
Inventors: 梁忠平; 温祖杰
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-11-20
Anticipated expiration: 2040-07-17
Also published as: CN111738437A

Abstract

One or more embodiments of the present specification provide a training method, a text generation method, an apparatus, and an electronic device; the training process is realized by adopting a Teacher-Student training framework in a way of combining a Teacher generated model and a Student generated model for training. The teacher generation model is a general text generation model and is used for the student generation model to learn the text generation method; the student generation model introduces a first length control vector for controlling the maximum length of the output text and a second length control vector for controlling the minimum length of the output text, and the capability of the student generation model for controlling the length of the output text is trained through the first length control vector and the second length control vector; and based on a reinforcement learning method, joint loss of the teacher generated model and the student generated model is obtained through the return value, and the student generated model is trained to obtain a generated model with controllable output text length for generating the output text with controllable output text length.

Description

Training method, text generation device and electronic equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of natural language processing technologies, and in particular, to a training method, a text generation method, an apparatus, and an electronic device.

Background

Text generation is a widely used natural language processing technology, and can be applied to many natural language processing tasks, such as a question and answer system, a chat system, an article abstract generation system, and the like. In some application scenarios, there is a limited requirement on the length of the generated text. For example, in the context of news summaries, it is desirable that the news summary generated cannot be too short and too long due to limitations in page size and display layout, and that text not be displayed if too short blank locations are too unsightly and too long. For another example, in a chat system, there is a limited requirement for generating machine responses, too long a response will bring reading burden to the user, reducing product experience, and too short a response will lose information, making the chat uninteresting. However, none of the existing text generation methods takes into account the problem of limiting the length of the generated text.

Therefore, how to effectively control the length of the generated text is a problem that needs to be solved in the technical field of natural language processing at present.

Disclosure of Invention

In view of this, an object of one or more embodiments of the present disclosure is to provide a training method, a text generation device, and an electronic device.

In view of the above, one or more embodiments of the present specification provide a training method for outputting a generative model with controllable text length, including:

acquiring a sample input text;

inputting the sample input text into a teacher generation model to obtain a first probability distribution corresponding to the sample input text;

constructing a first length control vector and a second length control vector; wherein the first length control vector is used for controlling the maximum length of the output text, and the second length control vector is used for controlling the minimum length of the output text;

inputting the sample input text, the first length control vector and the second length control vector into a student generation model to obtain a second probability distribution corresponding to the sample input text;

obtaining a return value for reinforcement learning according to the first probability distribution, the second probability distribution, the first length control vector and the second length control vector;

obtaining the loss of the student generation model according to the return value and the second probability distribution;

and training the student generated model by taking the minimum loss of the student generated model as a training target so as to obtain a generated model with controllable output text length when training is finished.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text generation method, including:

acquiring an input text;

constructing a first length control vector and a second length control vector; the first length control vector is used for controlling the maximum length of an output text corresponding to the input text, and the second length control vector is used for controlling the minimum length of the output text corresponding to the input text;

and inputting the input text, the first length control vector and the second length control vector into the generation model with controllable output text length obtained by training according to any one of the training methods to obtain the output text corresponding to the input text.

Based on the same inventive concept, one or more embodiments of the present specification further provide a training apparatus, including:

a first obtaining module configured to obtain a sample input text;

a first determining module configured to input the sample input text into a teacher generated model to obtain a first probability distribution corresponding to the sample input text;

a first construction module configured to construct a first length control vector and a second length control vector; wherein the first length control vector is used for controlling the maximum length of the output text, and the second length control vector is used for controlling the minimum length of the output text;

a second determining module configured to input the sample input text, the first length control vector and the second length control vector into a student generation model to obtain a second probability distribution corresponding to the sample input text;

a reward value determination module configured to derive a reward value for reinforcement learning from the first probability distribution, the second probability distribution, the first length control vector, and the second length control vector;

a first loss determination module configured to derive a loss for the student-generated model from the reward value and the second probability distribution;

and the first training module is configured to train the student generated model by taking the minimum loss of the student generated model as a training target so as to obtain the generated model with controllable output text length at the end of training.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text generation apparatus, including:

a second obtaining module configured to obtain an input text;

a second construction module configured to construct a first length control vector and a second length control vector; the first length control vector is used for controlling the maximum length of an output text corresponding to the input text, and the second length control vector is used for controlling the minimum length of the output text corresponding to the input text;

and the generating module is configured to input the input text, the first length control vector and the second length control vector into the generation model with controllable output text length obtained by training according to any one of the above training methods, so as to obtain an output text corresponding to the input text.

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method as described in any one of the above items when executing the program.

As can be seen from the above description, in the training method, the text generation device and the electronic device provided in one or more embodiments of the present specification, the training process is implemented in a manner of joint training of a Teacher generation model and a Student generation model by using a Teacher-Student training framework. The teacher generated model is a general text generation model used for the student generated model to learn the text generation method thereof. The student generation model introduces a first length control vector for controlling the maximum length of the output text and a second length control vector for controlling the minimum length of the output text, and the ability of the student generation model for controlling the length of the output text is trained through the first length control vector and the second length control vector. And on the basis of a reinforcement learning method, joint loss of the teacher generated model and the student generated model is obtained through the return value, and the student generated model is trained to obtain a generated model with controllable output text length, and the generated model is used for generating the output text with controllable output text length. The scheme of the disclosure effectively realizes the length control of the generated text, and can meet the requirement on the length of the generated text in a specific text generation task.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a flow diagram of a training method in accordance with one or more embodiments of the present disclosure;

FIG. 2 is a schematic diagram of the structure and operation process of a Transformer model;

FIG. 3 is a schematic diagram of the input of a student generated model in one or more embodiments of the present disclosure;

FIG. 4 is a flow diagram of a method for generating text in accordance with one or more embodiments of the disclosure;

FIG. 5 is a schematic diagram of a training apparatus according to one or more embodiments of the present disclosure;

FIG. 6 is a block diagram of a text generation apparatus according to one or more embodiments of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.

As described in the background section, the existing text generation method generally does not consider limiting the length of the generated text when generating the text. In natural language processing, a text generation model is usually obtained by training in a machine learning manner to realize a text generation task. In the process of implementing the present disclosure, the applicant finds that, in the existing text generation model, no parameter or processing step for controlling the length of the output text of the text generation model exists no matter in the training or generation process, which causes the problem that the existing text generation model cannot control the length of the output text output by the existing text generation model.

In view of the above, one or more embodiments of the present specification provide a training method, a text generation method, and related hardware. The training method aims to train and obtain a generation model with controllable output text length. The training process is realized by adopting a training framework of a Teacher-Student (also called knowledge distillation) in a way of joint training of a Teacher generated model and a Student generated model. The teacher generated model is a general text generation model used for the student generated model to learn the text generation method thereof. The student generation model introduces a first length control vector for controlling the maximum length of the output text and a second length control vector for controlling the minimum length of the output text, and the ability of the student generation model for controlling the length of the output text is trained through the first length control vector and the second length control vector. The method comprises the steps of reflecting a difference value between an output text and a length requirement and a difference of prediction capabilities of a teacher generation model and a student generation model through a return value based on a reinforcement learning method, and calculating loss of the student generation model based on the return value, wherein the loss of the student generation model is joint loss of the teacher generation model and the student generation model due to introduction of the return value. And training the student generated model by taking the minimum loss of the student generated model as a target so as to obtain the generated model with controllable output text length when training is finished. When the text is generated, the input text, the first length control vector and the second length control vector are input into the generated model with controllable length of the obtained output text, so that the output text corresponding to the input text can be obtained, and the length of the output text is controlled by the length control vector and the second length control vector so as to meet the requirement of the specific text generation task on the length of the generated text.

The technical solutions of one or more embodiments of the present specification are described in detail below with reference to specific embodiments.

One or more embodiments of the present specification provide a training method for outputting a generative model with controllable text length. Referring to fig. 1, the training method includes the following steps:

s101, obtaining a sample input text;

step S102, inputting the sample input text into a teacher generation model to obtain a first probability distribution corresponding to the sample input text;

step S103, constructing a first length control vector and a second length control vector; wherein the first length control vector is used for controlling the maximum length of the output text, and the second length control vector is used for controlling the minimum length of the output text;

step S104, inputting the sample input text, the first length control vector and the second length control vector into a student generation model to obtain a second probability distribution corresponding to the sample input text;

step S105, obtaining a return value for reinforcement learning according to the first probability distribution, the second probability distribution, the first length control vector and the second length control vector;

s106, obtaining the loss of the student generation model according to the return value and the second probability distribution;

and S107, training the student generation model by taking the minimum loss of the student generation model as a training target so as to obtain a generation model with controllable output text length when training is finished.

In this embodiment, a Teacher-Student training framework is adopted, and the teachers generation model and students generation model are jointly trained. Both the teacher generating model and the student generating model adopt a Transformer model. Referring to fig. 2, a schematic structural diagram of a Transformer model is shown. The Transformer model includes a Transformer encoder and a Transformer decoder. The method comprises the steps that after an input text is subjected to embedding processing to obtain word vectors, the word vectors are input into a Transformer encoder; the Transformer encoder comprises a plurality of Transformer encoding modules which are connected in sequence, and the last Transformer encoding module outputs to obtain an encoding vector corresponding to the input text. And inputting the coding vector into a Transformer decoder, wherein the Transformer decoder comprises a plurality of sequentially connected Transformer decoding modules, and the last Transformer decoding module outputs the probability distribution of the predicted output word. The Transformer decoder outputs a plurality of output words step by step to obtain output text. The input of each step of the Transformer decoder is the coded vector output by the Transformer encoder and all output words already output, and the output of each step of the Transformer decoder is the output word of the current step.

Fig. 2 only shows the structure and the working process of the Transformer model by way of example. It is to be understood that, when the training process is explained in the embodiment described later, the input of the Transformer model is to input the sample input text in the training sample.

In this embodiment, a sample input text is first obtained. The sample input text is from a sample set used to train teacher generated models and student generated models. According to different text generation tasks, different corpora can be selected from the sample set. For example, when applied to a question-and-answer task, the sample set may be the question-and-answer history of the user and the robot; when applied to a news digest generation task, the sample set may be several news texts, i.e., their corresponding digest texts. For ease of presentation, the sample input text may be presented asx ⁱThe sample output text corresponding to the sample input text isy ⁱ(ii) a Sample input textx ⁱAnd sample output texty ⁱI.e. to form a training samplex ⁱ，y ⁱ). Several training samples form a training set

WhereinDis shown in commonDAnd (4) training samples.

In this embodiment, sample input text is obtainedx ⁱThereafter, the sample is entered into the textx ⁱAnd inputting a teacher generated model. Wherein the teacher generated model comprises a first transform encoder and a first transform decoder. First, the sample input text is subjected to an embedding process. Specifically, each word in the sample input text is coded into a vector form in a one-hot mode, and then word embedding processing is performed to extract the characteristics of each word, so that a first word vector corresponding to each word in the sample input text is obtained. The algorithm used by the Word embedding process can be chosen arbitrarily, such as Word2Vec, GloVe, etc.

Then, a first position vector is generated for each word in the sample input text. Since the Transformer model uses global information and does not utilize sequential information of words, position embedding is required for each word in the sample input text to generate a corresponding first position vector to represent the position feature in the text. The specific method for generating the first position vector may adopt an existing arbitrary position embedding method, which is not limited in this embodiment.

For the first transform encoder, a first word vector and a first position vector corresponding to each word in the sample input text are linearly combined, and the result of the linear combination is used as a representation vector of each word. In this embodiment, the first word vector and the first position vector are directly added, and the added result is input to the first Transformer encoder, and sequentially passes through a plurality of Transformer encoding modules included in the first Transformer encoder, so as to obtain first encoding vectors corresponding to each word in the sample input text. For each transform coding module, the input passes through a Multi-Head Attention layer, an Add & Norm layer, a Feed Forward layer, and an Add & Norm layer in sequence. The Multi-Head orientation layer is composed of a plurality of Self-orientations, and semantic features of words are extracted and combined in different semantic environments. The Feed Forward layer is two hidden layers which are fully connected, and the Add & Norm layer is subjected to residual error connection and normalization processing. In the above layers included in the first transform encoder, the method of this embodiment does not involve any improvement in the structure and operation manner, and therefore the structure and operation manner thereof are not described in detail in this embodiment.

For the first transform decoder, the first transform encoder generates the first encoding vector to be input to the first transform decoder, and the first transform decoder outputs a plurality of output words step by step. In the first step, a first coded vector and a start symbol are input to a first transform decoder to obtain an output word of the first step. For each subsequent step, the first transform decoder generates an output word of the current step according to the first coded vector and all output words already output. The output of the first transform decoder is in a probability distribution form, a plurality of output word sequences and corresponding probabilities thereof are finally obtained, and then a final output word sequence is obtained through a beacon search algorithm and is used as an output text. For ease of representation, the first probability distribution output by the first transform decoder may be represented as

. Wherein,pa first probability distribution;

generating output words of the current step of the model for the teacher;

inputting text for the sample;

generating all output words of the model from the first step to the previous step of the current step for the teacher;

model parameters of the model are generated for the teacher.

After the first probability distribution is obtained, calculating the loss of the teacher generated model, wherein the loss function is

. Wherein,L _pthe loss of the model is generated for the teacher,Tand the actual length of the output text output by the teacher generated model refers to the number of words included in the output text output by the teacher generated model, and is obtained through statistics of the beamsearch algorithm. And after the loss of the teacher generated model is obtained, updating the model parameters of the teacher generated model by using a random gradient descent algorithm with the minimum loss of the teacher generated model as a training target, and realizing the training of the teacher generated model.

In this embodiment, the sample is also input into the textx ⁱThe student generated model is input. Wherein, the student generation model comprises: a second transform encoder and a second transform decoder. When the input text is input into the second transform coder, the second transform coder comprises a second word vector obtained by embedding the sample input text and a second position vector obtained by embedding the position of the sample input text, and also comprises a first length control vector used for controlling the maximum length of the output text and a second length control vector used for controlling the minimum length of the output text.

Referring to fig. 3, an input schematic of a model is generated for a student. Each rectangular box in fig. 2 represents a vector, the dimensions of the vectors are the same, and characters, letters, or numbers in the rectangular box represent the corresponding vector in the following description. In this embodiment, the sample input text takes "i love beijing tianan" as an example, and second word vectors corresponding to six words are obtained through word embedding, and then start characters < B > are added to the head positions of the sequence, so that all the second word vectors are formed. (1) To (6) is a second location vector for representing the location of the corresponding second word vector in the text; for example, "me" corresponds to (1), indicating that the position of "me" in the text is the first word, and "jing" corresponds to (4), indicating that the position of "jing" in the text is the fourth word.

0 to 6 are first length control vectors, each of which corresponds to a second word vector, and indicate that at most several output words can be output when the corresponding second word vector is decoded. The first length control vector controls the maximum length of the output text as a whole. In this embodiment, the output text only includes six output words at most by controlling the first length control vector. For example, for a start symbol "< B >" which is used to generate an output word of the first step during decoding, the corresponding first length control vector is 6, which indicates that a total of six output words can be generated after the current step; for the second word vector "jing", the corresponding first length control vector is 2, and when the second word vector "jing" is used for decoding to generate an output word, a total of two output words can also be generated. When the first length control vectors 0 to 6 are constructed, values of all dimensions can be preset, and only the six first length control vectors 0 to 6 are required to be different from each other. For example, the value of each dimension of the first length control vector 6 may be a numerical value 6.

And 0 'to 4' are second length control vectors, which respectively correspond to one second word vector and are used for indicating that at least several output words need to be output when the corresponding second word vector is decoded. The second length control vector controls the minimum length of the output text as a whole. In this embodiment, the output text needs to include at least four output words by the control of the second length control vector. For example, for the start symbol "< B >" which is used to generate the output word of the first step during decoding, the corresponding second length control vector is 4', which indicates that a total of at least four output words need to be generated after the current step; for the second word vector "jing", the corresponding second length control vector is 0', and when the second word vector "jing" is used for decoding to generate an output word, the number of generated output words already satisfies the minimum length of the output text. The second length control vectors 0 'to 4' can be constructed in a manner that references are made to the first length control vectors, i.e., only the respective control vectors need to be guaranteed to be different.

For the second transform encoder, referring to fig. 3, the second word vector, the second position vector, the first length control vector, and the second length control vector corresponding to each word in the sample input text are linearly combined, and the result of the linear combination is the representation vector of each word. In this embodiment, the first word vector, the first position vector, the first length control vector, and the second length control vector corresponding to each word are directly added, the added result is input to the second transform encoder, and after sequentially passing through a plurality of transform encoding modules included in the second transform encoder, the second encoding vectors corresponding to each word in the sample input text are obtained. The second transform encoder operates in a similar manner to the first transform encoder, and reference is made specifically to the description of the first transform encoder above.

For the second transform decoder, the second encoding vector generated by the second transform encoder is input into the second transform decoder, and the second transform decoder outputs a plurality of output words step by step. The second transform decoder operates in a similar manner to the first transform decoder, and reference is made to the description of the first transform decoder.

For convenience of representation, the second probability distribution output by the second transform decoder can be represented as

. Wherein,

a second probability distribution;

generating output words of the current step of the model for the students;

inputting text for the sample;ma first length control vector;na second length control vector;

generating all output words of the model from the first step to the previous step of the current step for the student;

model parameters of the model are generated for the student.

In this embodiment, a reinforcement learning method is used for training. Specifically, the first probability distribution generated by the teacher generated model, the second probability distribution generated by the student generated model, the first length control vector and the second length control vector are obtained through the steps, and the return value for reinforcement learning is calculated. The way in which the reward value is calculated can be expressed as:

wherein,ris a return value;T’generating the actual length of the output text output by the model for the student, wherein the length refers to the number of words included in the output text output by the teacher generated model, and is obtained through the statistics of a beamsearch algorithm;mdetermining the maximum length of an output text output by the student generation model according to the first length control vector;ndetermining the minimum length of the output text output by the student generation model according to the second length control vector;uandvthe smoothness of the reported value and the weight of the target learning value are controlled separately for the hyper-parameter.

Among the above-mentioned reported values, the value of,

the term is a first length difference obtained according to the maximum length and the actual length and is used for describing the difference between the actual length of the output text output by the student generation model and the set maximum length.

This term is the second length difference obtained from said minimum and actual lengths forAnd describing the difference between the actual length of the output text output by the student generation model and the set minimum length.

The term is the cross entropy of the first probability distribution and the second probability distribution, and is used for describing the difference of the prediction capabilities of the teacher generated model and the student generated model so as to enable the student generated model to learn the text generation method of the teacher generated model in the subsequent training process. And carrying out linear combination on the three items to obtain the return value.

In this embodiment, after the return value for reinforcement learning is obtained, the loss of the student generated model is calculated according to the return value and the second probability distribution generated by the student generated model. The loss of the student-generated model can be expressed as

. Wherein,

a loss of the model is generated for the student.

Therefore, the loss of the Student generation model introduces the obtained return value, the return value forms the joint loss of the Teacher generation model and the Student generation model, and the training framework of the Teacher-Student is realized; in addition, based on a reinforcement learning method, the introduction of the return value can enable the student generation model to be continuously optimized based on the difference of text length and the difference of prediction capability of the student generation model and the teacher generation model in the training process.

And training the student generated model by taking the minimum loss of the student generated model as a training target, wherein the trained student generated model is the generated model with controllable output text length after the training is finished. When the generation model with the controllable output text length is used for generating a text, the output text with the text length meeting the length requirements corresponding to the first length control vector and the second length control vector can be generated according to the preset first length control vector and the preset second length control vector.

It can be seen from the above embodiments that, in the training method of the generative model with controllable output text length, the training process adopts the Teacher-Student training framework and is realized in a manner of joint training of the Teacher generative model and the Student generative model. The teacher generated model is a general text generation model used for the student generated model to learn the text generation method thereof. The student generation model introduces a first length control vector for controlling the maximum length of the output text and a second length control vector for controlling the minimum length of the output text, and the ability of the student generation model for controlling the length of the output text is trained through the first length control vector and the second length control vector. Based on the reinforcement learning method, joint loss of the teacher generation model and the student generation model is obtained through the return value, the student generation model is trained to obtain the generation model with the controllable length of the output text, the generation model is used for generating the output text with the controllable length, the length control of the generated text is effectively achieved, and the requirement for the length of the generated text in a specific text generation task can be met.

It should be noted that, in the above embodiments, the transform encoder, the transform decoder, the beacon search algorithm, etc., the method of the present embodiment does not involve specific modifications to its specific structure or algorithm flow, and therefore, detailed principles and operation thereof are not described in detail.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text generation method. Referring to fig. 4, the text generation method includes the following steps:

s401, acquiring an input text;

step S402, constructing a first length control vector and a second length control vector; the first length control vector is used for controlling the maximum length of an output text corresponding to the input text, and the second length control vector is used for controlling the minimum length of the output text corresponding to the input text;

step S403, inputting the input text, the first length control vector, and the second length control vector into the generation model with controllable output text length obtained by training according to the training method of any of the above embodiments, so as to obtain an output text corresponding to the input text.

In this embodiment, the first length control vector and the second length control vector are correspondingly constructed according to the length requirement of the output text required by the specific text generation task, and the construction manner may refer to any one of the embodiments of the training method described above, which is not described in detail in this embodiment.

When the text is generated, the input text is subjected to word embedding and position embedding, and is linearly combined with the constructed first length control vector and the second length control vector to be input into the generation model with the controllable output text length. And outputting the generated model with the controllable text length, so that the output text corresponding to the input text can be output. The generation model with controllable output text length is based on a Transformer model, a beamsearch algorithm is adopted when determining the output text, the working mode of the generation model is similar to the training process in the embodiment of the training method, and detailed description is omitted in the embodiment.

Therefore, the generated model with controllable output text length obtained by training through the training method of the foregoing embodiment is applied, so that the output text output by the generated model can meet the requirement of the text generation task on the length of the generated text.

Based on the same inventive concept, one or more embodiments of the present specification further provide a training device. Referring to fig. 5, the training apparatus includes:

a first obtaining module 501 configured to obtain a sample input text;

a first determining module 502 configured to input the sample input text into a teacher generated model, resulting in a first probability distribution corresponding to the sample input text;

a first construction module 503 configured to construct a first length control vector and a second length control vector; wherein the first length control vector is used for controlling the maximum length of the output text, and the second length control vector is used for controlling the minimum length of the output text;

a second determining module 504, configured to input the sample input text, the first length control vector, and the second length control vector into a student generation model, so as to obtain a second probability distribution corresponding to the sample input text;

a reward value determination module 505 configured to obtain a reward value for reinforcement learning according to the first probability distribution, the second probability distribution, the first length control vector and the second length control vector;

a first loss determination module 506 configured to derive a loss of the student-generated model based on the reward value and the second probability distribution;

and the first training module 507 is configured to train the student generated model with the minimum loss of the student generated model as a training target so as to obtain a generated model with controllable output text length at the end of training.

As an optional embodiment, the training apparatus further includes: a second loss determination module configured to derive a loss for the teacher-generated model based on the first probability distribution; a second training module configured to train the teacher generated model with a minimum loss of the teacher generated model as a training target.

As an alternative embodiment, the teacher generated model includes: a first transform encoder and a first transform decoder; the first determining module is specifically configured to perform embedding processing on the sample input text to obtain first word vectors corresponding to words in the sample input text; respectively generating a first position vector for each word in the sample input text; linearly combining the first word vector and the first position vector and inputting the combined first word vector and the first position vector into the first Transformer encoder to obtain first encoding vectors corresponding to all words in the sample input text; and inputting the first coding vector into the first transform decoder to obtain a first probability distribution corresponding to the sample input text.

As an alternative embodiment, the student generated model comprises: a second transform encoder and a second transform decoder; the second determining module is specifically configured to perform embedding processing on the sample input text to obtain second word vectors corresponding to words in the sample input text; respectively generating a second position vector for each word in the sample input text; linearly combining the second word vector, the second position vector, the first length control vector and the second length control vector and inputting the combined second word vector, the second position vector, the first length control vector and the second length control vector into the second transform encoder to obtain second encoding vectors corresponding to all words in the sample input text; and inputting the second coding vector into the second transform decoder to obtain a second probability distribution corresponding to the sample input text.

As an optional embodiment, the report value determining module is specifically configured to determine, according to the first length control vector and the second length control vector, a maximum length and a minimum length of an output text output by the student generation model, respectively; determining the actual length of an output text output by the student generation model; determining a first length difference according to the maximum length and the actual length; determining a second length difference according to the minimum length and the actual length; calculating cross-entropy of the first probability distribution and the second probability distribution; and linearly combining the first length difference, the second length difference and the cross entropy to obtain the return value.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus of the foregoing embodiment is used to implement the corresponding training method in the foregoing embodiment, and has the beneficial effects of the corresponding training method embodiment, which are not described herein again.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text generation apparatus. Referring to fig. 6, the text generation apparatus includes:

a second obtaining module 601 configured to obtain an input text;

a second construction module 602 configured to construct a first length control vector and a second length control vector; the first length control vector is used for controlling the maximum length of an output text corresponding to the input text, and the second length control vector is used for controlling the minimum length of the output text corresponding to the input text;

the generating module 603 is configured to input the input text, the first length control vector, and the second length control vector into the generated model with controllable output text length, which is obtained by training according to the training method described in any of the above embodiments, so as to obtain an output text corresponding to the input text.

The apparatus in the foregoing embodiment is used to implement the corresponding text generation method in the foregoing embodiment, and has the beneficial effects of the corresponding text generation method embodiment, which are not described herein again.

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the training method or the text generation method according to any one of the above embodiments.

Fig. 7 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method for outputting a text length controllable generative model comprises the following steps:

acquiring a sample input text;

2. The method of claim 1, further comprising:

obtaining the loss of the teacher generated model according to the first probability distribution;

and training the teacher generated model by taking the minimum loss of the teacher generated model as a training target.

3. The method of claim 1, the teacher generated model comprising: a first transform encoder and a first transform decoder;

the step of inputting the sample input text into a teacher generation model to obtain a first probability distribution corresponding to the sample input text specifically includes:

embedding the sample input text to obtain first word vectors corresponding to all words in the sample input text;

respectively generating a first position vector for each word in the sample input text;

linearly combining the first word vector and the first position vector and inputting the combined first word vector and the first position vector into the first Transformer encoder to obtain first encoding vectors corresponding to all words in the sample input text;

and inputting the first coding vector into the first transform decoder to obtain a first probability distribution corresponding to the sample input text.

4. The method of claim 1, the student generating a model comprising: a second transform encoder and a second transform decoder;

inputting the sample input text, the first length control vector and the second length control vector into a student generation model to obtain a second probability distribution corresponding to the sample input text, and specifically comprising:

embedding the sample input text to obtain second word vectors corresponding to all words in the sample input text;

respectively generating a second position vector for each word in the sample input text;

linearly combining the second word vector, the second position vector, the first length control vector and the second length control vector and inputting the combined second word vector, the second position vector, the first length control vector and the second length control vector into the second transform encoder to obtain second encoding vectors corresponding to all words in the sample input text;

and inputting the second coding vector into the second transform decoder to obtain a second probability distribution corresponding to the sample input text.

5. The method of claim 1, wherein deriving the reward value for reinforcement learning from the first probability distribution, the second probability distribution, the first length control vector, and the second length control vector comprises:

determining the maximum length and the minimum length of an output text output by the student generation model according to the first length control vector and the second length control vector;

determining the actual length of an output text output by the student generation model;

determining a first length difference according to the maximum length and the actual length;

determining a second length difference according to the minimum length and the actual length;

calculating cross-entropy of the first probability distribution and the second probability distribution;

and linearly combining the first length difference, the second length difference and the cross entropy to obtain the return value.

6. A text generation method, comprising:

acquiring an input text;

inputting the input text, the first length control vector and the second length control vector into a generation model with controllable output text length obtained by training according to any one of the training methods in claims 1-5 to obtain an output text corresponding to the input text.

7. An exercise device comprising:

a first obtaining module configured to obtain a sample input text;

8. The apparatus of claim 7, further comprising:

a second loss determination module configured to derive a loss for the teacher-generated model based on the first probability distribution;

a second training module configured to train the teacher generated model with a minimum loss of the teacher generated model as a training target.

9. The apparatus of claim 7, the teacher generated model comprising: a first transform encoder and a first transform decoder;

the first determining module is specifically configured to perform embedding processing on the sample input text to obtain first word vectors corresponding to words in the sample input text; respectively generating a first position vector for each word in the sample input text; linearly combining the first word vector and the first position vector and inputting the combined first word vector and the first position vector into the first Transformer encoder to obtain first encoding vectors corresponding to all words in the sample input text; and inputting the first coding vector into the first transform decoder to obtain a first probability distribution corresponding to the sample input text.

10. The apparatus of claim 7, the student-generated model comprising: a second transform encoder and a second transform decoder;

the second determining module is specifically configured to perform embedding processing on the sample input text to obtain second word vectors corresponding to words in the sample input text; respectively generating a second position vector for each word in the sample input text; linearly combining the second word vector, the second position vector, the first length control vector and the second length control vector and inputting the combined second word vector, the second position vector, the first length control vector and the second length control vector into the second transform encoder to obtain second encoding vectors corresponding to all words in the sample input text; and inputting the second coding vector into the second transform decoder to obtain a second probability distribution corresponding to the sample input text.

11. The apparatus of claim 7, the reward value determination module, being specifically configured to determine a maximum length and a minimum length of an output text output by the student-generated model from the first length control vector and the second length control vector, respectively; determining the actual length of an output text output by the student generation model; determining a first length difference according to the maximum length and the actual length; determining a second length difference according to the minimum length and the actual length; calculating cross-entropy of the first probability distribution and the second probability distribution; and linearly combining the first length difference, the second length difference and the cross entropy to obtain the return value.

12. A text generation apparatus comprising:

a second obtaining module configured to obtain an input text;

a generating module configured to input the input text, the first length control vector and the second length control vector into a generation model with controllable output text length obtained by training according to any one of claims 1 to 5, so as to obtain an output text corresponding to the input text.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 6 when executing the program.