CN113627199A

CN113627199A - Machine translation method and device

Info

Publication number: CN113627199A
Application number: CN202010388975.1A
Authority: CN
Inventors: 翁荣祥; 于恒; 骆卫华
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2021-11-09

Abstract

The application discloses a machine translation method and a device thereof, wherein the method comprises the following steps: receiving a source text of a source language; inputting the source text into an encoder, and generating an encoded output of the encoder by using the hidden representation output by each sub-encoder in the encoder, wherein each sub-encoder comprises at least one neural network layer; and decoding the coded output by using a decoder to obtain a target text of a target language. The method and the device have the advantages that the encoder is segmented, and the encoding output is generated by the plurality of sub-encoders together, so that the encoding output can represent more text characteristics of the source text, namely, the encoding output comprising the low-layer semantic information and the high-layer syntax information is provided for the decoder to be processed, and the source text can be translated more accurately.

Description

Machine translation method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a machine translation method and apparatus.

Background

In the information age, computer technology is developing at a high speed, more and more repeated labor can be replaced by computers, and liberated labor can be used for more innovative and challenging work. The rise of artificial intelligence in the context of deep learning has accelerated the implementation of this process even more over the years.

Machine translation, also called automatic translation, is a process of converting one natural language (source language) into another natural language (target language) using a computer. In practice, it is found that the deeper the depth (number of layers) of the machine translation model, the more feature information is extracted, but accordingly, much underlying information is lost and training is not possible due to gradient decay.

In summary, the current machine translation method still has disadvantages.

Disclosure of Invention

The embodiment of the application provides a machine translation method and a device thereof, which are at least used for solving the technical problems mentioned above.

The embodiment of the application also provides a machine translation method, which comprises the following steps: receiving a source text of a source language; inputting the source text into an encoder, and generating an encoded output of the encoder by using the hidden representation output by each sub-encoder in the encoder, wherein each sub-encoder comprises at least one neural network layer; and decoding the coded output by using a decoder to obtain a target text of a target language.

The embodiment of the present application further provides a machine translation apparatus, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the above-described methods.

Embodiments of the present application also provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the above methods.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

the method and the device have the advantages that the encoder is segmented, and the encoding output is generated by the plurality of sub-encoders together, so that the encoding output can represent more text characteristics of the source text, namely, the encoding output comprising the low-layer semantic information and the high-layer syntax information is provided for the decoder to be processed, and the source text can be translated more accurately.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a scenario diagram illustrating a machine translation system according to an exemplary embodiment of the present application;

FIG. 2 is a block diagram illustrating a machine translation model according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart illustrating a method of machine translation according to an exemplary embodiment of the present application;

FIG. 4 is a diagram illustrating an association of an encoder with a decoder according to an exemplary embodiment of the present application;

FIG. 5 is a diagram illustrating an association of an encoder with a decoder according to an exemplary embodiment of the present application;

fig. 6 is a block diagram illustrating a machine translation device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic view of a scenario of a machine translation system according to an exemplary embodiment of the present application, where the system may include an electronic terminal 11 used by a user side and a server 12 providing services for each electronic terminal, and the electronic terminal 11 and the server 12 are connected by way of an internet formed by various gateways, which is not described again. The number of electronic terminals 11 and servers 12 is not limited by fig. 1.

The electronic terminal 11 includes, but is not limited to, a portable terminal provided with an instant messaging application, such as a mobile phone and a tablet, and a fixed terminal, such as a computer, an inquiry machine, an advertisement machine, etc., and is a service port that can be used by a user, hereinafter referred to as a client, and the electronic terminal 11 may be, for example, a mobile phone downloaded with a chat application, etc.; in the application, a client provides a sentence to be translated input function, a translation result display function and the like;

the server 12 provides various services for the instant messaging user, and provides application layer support, including a data server, a training server, a translation server, a storage server and the like; the data server is used for preprocessing data and the like, the training server is used for training the machine model, and the translation server is used for translating according to the machine model and the like.

In the embodiment of the application, after determining the source language to be translated and the target translation language by the translation application on the electronic terminal 11, the electronic terminal inputs the sentence to be translated and sends the sentence to the server 12. After receiving the sentence to be translated, the server 12 invokes the trained machine translation model to translate the sentence to be translated into the target sentence in the target translation language.

Although only the translation application is given above to perform translation by using the machine translation method of the present application, the usage scenario of the machine translation method of the present application is not limited thereto, and for example, chat content may be translated into a target sentence according to a user instruction in a chat application, or shopping information may be translated into a target sentence in a shopping application, and for example, lyrics/subtitles may be translated into a target sentence in a video application.

It should be noted that the system scenario diagram shown in fig. 1 is only an example, and the server and the scenario described in the embodiment of the present invention are for more clearly illustrating the technical solution of the embodiment of the present invention, and do not form a limitation on the technical solution provided in the embodiment of the present invention.

The following description will be made in detail in conjunction with fig. 2, wherein fig. 2 shows a block diagram of a machine translation model according to an exemplary embodiment of the present application.

The present application will employ a Neural Machine Translation (NMT) model as shown in fig. 2. The NMT model completely adopts a neural network to complete the translation process from a source language to a target language.

As shown in fig. 2, the NMT model may include an encoder and a decoder. The source sentence may be preprocessed, which may include participle processing and word vector processing, before being input to the encoder.

The word segmentation processing means that a source sentence is converted into each word segmentation according to the word segmentation unit based on statistical word segmentation, wherein the statistical word segmentation is from a standard corpus, and the corpus is generally divided according to the field, for example, a medical corpus corresponds to the medical field and an aviation corpus corresponds to the aviation field. For example, for the source language being Chinese, the source sentence "Beijing welcome you" may be segmented into "Beijing", "welcome", "you"; for the source language being English, the source sentence "Welcome to Beijing" may be segmented into "Welcome", "to", "Bei", "jing".

Subsequently, word vector processing may be performed on the respective participles, that is, a word vector corresponding to each participle is acquired. In the process of obtaining a participle vector corresponding to each participle, Word Embedding (Word Embedding) processing may be performed on each participle, thereby obtaining a participle vector corresponding to each participle. The word embedding process refers to obtaining dense vector representation of each element through a trained neural network semantic model by using context content, for example, "struggle" can be represented by a vector (0.1, 0.2, 0.3), and "unused" can be represented by a vector (0.7, 0.3, 0.3). Since word embedding is a technique commonly used in the art, it will not be described herein, but those skilled in the art will understand that all methods of using vector representation for word segmentation can be applied to the present application.

As shown in FIG. 2, the source sentence is converted to an X representation after being preprocessed, where X is (X)₁,X₂,X₃,X₄,X₅). Inputting x into encoder, the encoder converts source sentence into hidden information expression, decoder reads the hidden information and generates target sentence Y ═ (Y)₁,Y₂,Y₃,Y₄,Y₅)。

In implementation, more semantic information in a source sentence can be extracted by deepening the number of layers of a neural network layer included in an encoder, for example, a bottom layer neural network layer can extract semantic position information and semantic detail information, and a higher layer neural network layer can extract syntax information, but if the number of layers of the encoder is increased, a decoder may not obtain the bottom layer information extracted by the encoder, so that a text is not accurate enough.

In addition, in the process of training the NMT model, a back propagation algorithm is adopted, that is, the target text generated by the NMT model is matched with the translated text of the source text by human, at this time, an error signal in the back propagation needs to pass through the whole encoder, but an unstable gradient stream is brought by the increase of the depth of the encoder, so that the difficulty of constructing a deep encoder is that the deep encoder cannot be trained.

Based on the above, the machine translation method of the application performs the partition processing on the multiple neural network layers included in the encoder, then takes the hidden representation of the sub-encoder including at least one neural network layer as output, and jointly acts on the decoder, and finally performs the decoding processing by using the decoder, so as to obtain the target text of the target language. As will be described in detail below in connection with fig. 3.

FIG. 3 shows a flow diagram of a method of machine translation according to an example embodiment of the present application.

In step S310, source text in a source language is received. As an example, the source and target languages may be determined prior to utilizing the machine translation method of the present application, e.g., Chinese may be selected as the source language and English as the target language. The source text may be words or sentences entered by the user. For example, the user may only enter words such as "goodbye", "hello", etc., or may enter a statement such as "where you go today".

In step S320, the source text is input into an encoder, and the encoded output of the encoder is generated using hidden representations of the outputs of respective sub-encoders in the encoder, wherein each sub-encoder includes at least one neural network layer.

In the present application, the encoder may comprise a plurality of neural network layers, that is, the encoder is actually a deep neural network model. The layer structures of the neural network layers can be the same or different, and the neural network layers are connected according to a set structure to jointly form the encoder.

Based on this, the encoder may be divided into a plurality of sub-encoders by the number of layers of the neural network layer. As an example, the dividing manner and the number of sub-encoders may be set according to the number of layers, wherein the dividing manner includes equal-layer number division, equal-difference layer number division, and manual division.

The equal-layer number division refers to a division mode that the number of layers of the neural network layer in each sub-encoder is the same, for example, when the encoder includes 15 neural network layers, the encoder may be divided into 3 sub-encoders, each encoder includes 5 neural network layers, or the encoder may be divided into 5 sub-encoders, each sub-encoder includes 3 neural network layers.

The number of arithmetic units is divided into 3 sub-encoders, for example, when the encoder includes 9 neural network layers, the first encoder includes 2 neural network layers, the second encoder includes 3 neural network layers, and the third encoder includes 4 neural network layers, or the first encoder includes 4 neural network layers, the second encoder includes 3 neural network layers, and the third encoder includes 2 neural network layers.

The manual division refers to division manually performed by a user (e.g., a technician) according to the number of layers of a neural network layer included in the encoder, for example, when the encoder is a 17-layer neural network layer, the encoder may be divided into three sub-encoders according to a user instruction, where a first sub-encoder includes 6 layers of the neural network layer, a second sub-encoder includes 6 layers of the neural network layer, and a third sub-encoder includes 5 layers of the neural network layer.

Subsequently, processing is performed on the input word segmentation vectors by the plurality of sub-encoders, and a hidden representation output by each sub-encoder is obtained. Specifically, since the sub-encoders are sequentially connected, processing is performed, as input information, for each sub-encoder, an input word segmentation vector input from the current sub-encoder and output information of the previous encoder, thereby acquiring a plurality of hidden representations corresponding to the plurality of sub-encoders, respectively.

With these hidden representations, the encoded output of the encoder is generated. As an example, the encoded output of the encoder may be generated by determining a weight for each hidden representation, including: the weight of each hidden representation is determined by an Attention mechanism, that is, the decoded output of the current decoder is used as a query (query) of an Attention model (namely, Attention Function), and the hidden representation of each sub-encoder is used as a key value (key) of the Attention model, and the weight of each sub-encoder is obtained by calculating the similarity. Optionally, these weights may be normalized, for example, using a softmax function. Finally, the encoded output is determined according to each hidden representation and the corresponding weight.

As another example, since the sub-encoders are sequentially connected, the encoded outputs of the sub-encoders have timing information. Based on this, the hidden representation output by each sub-encoder can be input as single sequence data to a neural network layer for the sequence data, with the output sequence data being the encoded output of the encoder. The Neural Network layer for sequence data includes a Recurrent Neural Network (RNN) and its variants, such as Gated Recurrent Unit (GRU) model, bidirectional Gated Recurrent Unit (BiGRU) or Long-Short Term Memory (LSTM) model, which are known models and will not be described in detail herein.

In step S330, decoding is performed on the encoded output by a decoder to obtain a target text in a target language.

In this application, the decoder may include at least one neural network layer, that is, the number of layers of the neural network layer of the decoder is not limited. Preferably, the decoder may include as many neural network layers as there are encoders. In an implementation, the encoded output is input to each of the sub-decoders in the decoder, and the sub-decoders are sequentially caused to perform processing so that the output of the previous sub-decoder and the encoded output of the corresponding encoder are used as the input of the current sub-decoder, and the output of the last sub-decoder is used as the target text.

In the present application, after receiving a source text, the source text may be input into a trained machine translation model to obtain a target text in a target language, wherein the machine translation model includes an encoder and a decoder configured as above.

Thus, prior to performing the method as shown in FIG. 3, training may be performed on the machine translation model using a training set. The training set may include a large number of training samples, where each training sample includes a training source text and a corresponding training target text.

And then inputting each training source text into the machine translation model to obtain a target text. And adjusting the full network parameters in the machine translation model by using a back propagation algorithm according to the target text and the training target text until the machine translation model meets the preset requirement, for example, the probability of the target text is the maximum value, thereby completing the training of the machine translation model by using a training set.

In the training process, because the hidden representation of each sub-encoder acts on the corresponding decoder, the gradient return path in the training process is greatly reduced, and thus the problem of unstable gradient flow caused by the increase of the depth of a neural network layer included by the encoder is solved.

Compared with the prior art, the method for dividing the encoder and jointly generating the encoded output by using the plurality of sub-encoders enables the encoded output to represent more text characteristics of the source text, namely, the encoded output comprising the lower-layer semantic information and the higher-layer syntax information is provided for the decoder to be processed, so that the source text can be translated more accurately.

The correspondence of the encoder and the decoder will be described below in conjunction with fig. 4 and 5. Fig. 4 and 5 each show a diagram of an association of an encoder and a decoder according to an exemplary embodiment of the present application. It should be noted that the number of sub-encoders and sub-decoders in fig. 4 and 5 is only illustrative, and a user may set different numbers as needed.

As shown in FIG. 4, the encoder is divided into five sub-encoders, and the second sub-word X is divided₂After being input into the encoder, the first sub-encoder divides the second word into X₂Performing encoding, obtaining a first hidden representation, inputting the first hidden representation to a second sub-encoder, obtaining a second sub-encoded output, proceeding in sequence, thereby obtaining a first to a fifth hidden representation, and then determining a weight for each hidden output using an attention mechanism, as an example, a decoder for the last participle X₂Decoded output Y of₁As a query of the five sub-encoders, the hidden representations of the five sub-encoders are used as key values, and the weight of each sub-encoded output is determined.

Preferably, the decoder may be divided into the same number of sub-encoders as the number of sub-decoders as shown in fig. 4. In this embodiment, each sub-decoder needs to perform decoding using the output information of the last sub-decoder and the encoded output of the encoder when decoding. Therefore, the output information output by the last sub-decoder can be used as the query of the current hidden output, so as to determine the weight of each sub-encoder for the current sub-decoder. As shown in fig. 4, the output O of the second sub-encoder₂The hidden outputs output by the first sub-encoder to the fifth sub-encoder are used as key values, and the attention mechanism is utilized to determine the hidden outputsHiding the weight of the output, finally, determining the coded output corresponding to the third sub-decoder by using each hidden output and the corresponding weight, and then, the third sub-decoder uses the coded output and O₂Output O₃And the like in turn until the coding result Y is output₂。

In summary, according to the machine translation method of the exemplary embodiment of the present application, the attention mechanism is used to determine the weight value of the hidden representation output by each sub-encoder, so that the context information of each participle can be better reflected, and the translation result is more accurate.

Furthermore, the present application may also adopt the manner as shown in fig. 5 for the association between the encoder and the decoder.

As shown in fig. 5, the hidden representation output by each sub-encoder is sequentially output to the corresponding BiGRU as time-series data. It should be noted that although BiGRU is shown in fig. 5, all recurrent neural networks that can process temporal data are applicable here.

In particular, the first hidden representation C of the first sub-encoder output₁Input to a first BiGRU, which can then generate an output L using C1₁And inputting the forward hidden layer state and the reverse hidden layer state into a second BiGRU, and acquiring a second hidden representation C output by a second sub-encoder by the second BiGRU₂. The second BiGRU represents C with a forward hidden state, a reverse hidden state, and a second hidden representation₂Generating an output L₂And the next BiGRU output by the corresponding forward hidden layer state and the reverse hidden layer state is executed in sequence until the last sub-encoder (the fifth sub-encoder in the figure) utilizes the corresponding BiGRU to execute processing and output as the encoding output of the encoder to be transmitted to the decoder.

In this process, the first sub-decoder may utilize the L output of the first BiGRU₁Performing decoding, and then inputting the first decoding result to a second sub-decoder using the L output from the second BiGRU₂And the first decoding result executes decoding, and inputs the second decoding result into the next sub-decoder, and executes in sequence until the endThe sub-decoder outputs a decoding result Y₂。

In fig. 5, which shows an example of the same number of sub-decoders and sub-encoders, in an implementation, if the sub-decoders are less than the sub-encoders, the last sub-decoder can obtain the decoding result Y by using the output of the corresponding BiGRU and the output of the previous sub-decoder₂。

In summary, the machine translation method according to the exemplary embodiment of the present application may input the hidden representation output by each sub-encoder as each time series data to the corresponding RNN model or the variant of the RNN model, so that the hidden representation of each sub-encoder can be mapped to the decoder side, and the corresponding gradient return path is greatly reduced and is not affected by the gradient attenuation.

In implementation, the machine translation method according to an exemplary embodiment of the present application may be executed by an electronic device in which various applications are installed, and the applications may include an instant messenger for office use, an application applied to shopping, and the like.

For example, there may be difficulties in communicating with foreign colleagues during the electronic device running an office instant messaging program (e.g., nailing). In response to a translation request for a source text sent by a user, the machine translation method of the exemplary embodiment of the present application may be utilized to translate the source text, and obtain and provide a target text to the user.

Similarly, when the electronic device runs an instant messaging program for shopping (e.g., sukikai or Lazada), and the user faces a commodity introduction (e.g., an english description specification) of the source text, a translation request for the source text may be issued, and then the application translates the source text by using the machine translation method according to the exemplary embodiment of the present application, acquires and provides the target text to the user.

In addition, the machine translation method according to the exemplary embodiment of the present application may be a method executed by an application server corresponding to each application, or may be a method executed by an independent module coupled to the server, that is, the machine translation apparatus according to the exemplary embodiment of the present application may be a module embedded in the application server or a module externally installed in the application server, and the present application is not limited thereto.

Fig. 6 shows a block diagram of a machine translation device of an exemplary embodiment of the present application. Referring to fig. 6, the apparatus includes, at a hardware level, a processor, an internal bus, and a computer-readable storage medium, wherein the computer-readable storage medium includes a volatile memory and a non-volatile memory. The processor reads the corresponding computer program from the non-volatile memory and then runs it. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Specifically, the processor performs the following operations: receiving a source text of a source language; inputting the source text into an encoder, generating an encoded output of the encoder using a hidden representation of the output of each sub-encoder in the encoder, wherein each sub-encoder comprises at least one neural network layer; and decoding the coded output by using a decoder to obtain a target text of a target language.

Optionally, the processor, after receiving the source text of the source language in the implementing step, further includes: performing word segmentation processing on the source text to obtain each input word segmentation of the source text; and executing word embedding processing on each input word segmentation to obtain an input word segmentation vector of each input word segmentation so as to input the input word segmentation vector to the encoder.

Optionally, the processor in implementing step generating the encoded output of the encoder from the source text input encoder using the hidden representations of the respective sub-encoder outputs in the encoder comprises: dividing the encoder into a plurality of sub-encoders according to the number of layers of the neural network layer; processing the input word segmentation vector by using the plurality of sub-encoders to obtain a hidden representation output by each sub-encoder; the encoded output of the encoder is generated using the hidden representation of each sub-encoder output.

Optionally, the processor, in the implementing step, performing processing on the input word segmentation vector by using the plurality of sub-encoders to obtain a hidden representation output by each sub-encoder, includes: for each sub-encoder, processing is performed with the input word segmentation vector input from the current sub-encoder and the output information of the previous encoder as input information, thereby acquiring a plurality of hidden representations corresponding to the plurality of sub-encoders, respectively.

Optionally, the processor in implementing step generating the encoded output of the encoder using the hidden representation of each sub-encoder output comprises: the encoded output of the encoder is generated by determining the weight of each hidden representation.

Optionally, the processor in the step of implementing generates the encoded output of the encoder by determining a weight corresponding to each concealment, comprises: determining a weight for each hidden representation using an attention mechanism; determining the encoded output from each hidden representation and the corresponding weight.

Optionally, the processor in implementing step utilizes a hidden representation of each sub-encoder output, generating the encoded output of the encoder comprising: the hidden representation output by each sub-encoder is input as individual sequence data to a neural network layer for the sequence data, with the output sequence data being the encoded output of the encoder.

Optionally, the neural network layer for sequence data comprises a recurrent neural network.

Optionally, the decoder comprises a plurality of sub-decoders, wherein the number of the plurality of sub-decoders is not constrained by the number of sub-encoders.

Optionally, the processor, in the implementing step, performing decoding on the encoded output by a decoder to obtain a target text in a target language includes: and inputting the encoded output to each of the sub-decoders in the decoder, each of the sub-decoders performing decoding using the output decoded by the previous sub-decoder and the corresponding encoded output and inputting the decoded output to the next sub-decoder, sequentially performing processing, and regarding the output of the last sub-decoder as the target text.

Further, the processor performs the following operations: receiving a source text of a source language; inputting the source text into a trained machine translation model, and obtaining a target text of a target language, wherein the machine translation model comprises an encoder and a decoder, the encoder comprises a plurality of sub-encoders, and an encoded output represented by a plurality of hidden representations output by the plurality of sub-encoders is used as an input of the decoder.

The machine translation apparatus according to the exemplary embodiment of the present application divides the encoder and generates the encoded output collectively using a plurality of sub-encoders, so that the encoded output can represent more text characteristics of the source text, i.e., the encoded output including lower-level semantic information and higher-level syntax information is provided to the decoder for processing, thereby enabling more accurate translation of the source text. Furthermore, since the hidden representation of each sub-encoder is applied to the corresponding decoder, the gradient feedback path during the training process is greatly reduced. Furthermore, the attention mechanism is utilized to determine the weight value of the hidden representation output by each sub-encoder, so that the context information of each participle can be better reflected, and the translation result is more accurate. Further, the hidden representation output by each sub-encoder can be input to the corresponding RNN model or a variant of the RNN model as each time series data, so that the hidden representation of each sub-encoder can be mapped to the decoder side, and the corresponding gradient return path can be greatly reduced, thereby being not affected by gradient attenuation.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of machine translation, comprising:

receiving a source text of a source language;

inputting the source text into an encoder, generating an encoded output of the encoder using a hidden representation of the output of each sub-encoder in the encoder, wherein each sub-encoder comprises at least one neural network layer;

and decoding the coded output by using a decoder to obtain a target text of a target language.

2. The method of claim 1, wherein receiving source text in a source language further comprises:

performing word segmentation processing on the source text to obtain each input word segmentation of the source text;

performing word embedding processing on each input word segmentation to obtain an input word segmentation vector of each input word segmentation;

the entering the source text into an encoder comprises:

inputting the input word segmentation vector to an encoder.

3. The method of claim 2, wherein inputting the source text into an encoder, generating an encoded output of the encoder using a hidden representation of the output of each sub-encoder in the encoder, comprises:

dividing the encoder into a plurality of sub-encoders according to the number of layers of the neural network layer;

processing the input word segmentation vector by using the plurality of sub-encoders to obtain a hidden representation output by each sub-encoder;

the encoded output of the encoder is generated using the hidden representation of each sub-encoder output.

4. The method of claim 3, wherein performing processing on the input participle vector with the plurality of sub-encoders to obtain a hidden representation of each sub-encoder output comprises:

for each sub-encoder, processing is performed with the input word segmentation vector input from the current sub-encoder and the output information of the previous sub-encoder as input information, thereby acquiring a plurality of hidden representations corresponding to the plurality of sub-encoders, respectively.

5. The method of claim 3, wherein generating the encoded output of the encoder using the hidden representation of each sub-encoder output comprises:

the encoded output of the encoder is generated by determining the weight of each hidden representation.

6. The method of claim 5, wherein generating the encoded output of the encoder by determining a weight corresponding to each concealment comprises:

determining a weight for each hidden representation using an attention mechanism;

determining the encoded output from each hidden representation and the corresponding weight.

7. The method of claim 1, wherein generating the encoded output of the encoder using the hidden representation of each sub-encoder output comprises:

the hidden representation output by each sub-encoder is input as individual sequence data to a neural network layer for the sequence data, with the output sequence data being the encoded output of the encoder.

8. The method of claim 7, wherein the neural network layer for sequence data comprises a recurrent neural network.

9. The method of claim 8, wherein the decoder comprises a plurality of sub-decoders, wherein a number of the plurality of sub-decoders is not constrained by a number of sub-encoders.

10. The method of claim 9, wherein performing decoding on the encoded output with a decoder to obtain target text in a target language comprises:

and inputting the encoded output to each of the sub-decoders in the decoder, each of the sub-decoders performing decoding using the output decoded by the previous sub-decoder and the corresponding encoded output and inputting the decoded output to the next sub-decoder, sequentially performing processing, and regarding the output of the last sub-decoder as the target text.

11. A method of machine translation, comprising:

receiving a source text of a source language;

inputting the source text into a trained machine translation model, and obtaining a target text of a target language, wherein the machine translation model comprises an encoder and a decoder, the encoder comprises a plurality of sub-encoders, and an encoded output represented by a plurality of hidden representations output by the plurality of sub-encoders is used as an input of the decoder.

12. A machine translation device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-11.

13. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-11.