CN109558604B - Machine translation method and device, electronic equipment and storage medium - Google Patents
Machine translation method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN109558604B CN109558604B CN201811542809.1A CN201811542809A CN109558604B CN 109558604 B CN109558604 B CN 109558604B CN 201811542809 A CN201811542809 A CN 201811542809A CN 109558604 B CN109558604 B CN 109558604B
- Authority
- CN
- China
- Prior art keywords
- translation
- target
- similar
- target language
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013519 translation Methods 0.000 title claims abstract description 440
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000014616 translation Effects 0.000 claims abstract description 439
- 239000013604 expression vector Substances 0.000 claims description 47
- 238000004590 computer program Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 12
- 230000003287 optical effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention discloses a machine translation method, a machine translation device, electronic equipment and a storage medium. The method comprises the following steps: translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation; acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1; and determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. A large number of translation samples can be generated, so that the machine translation effect of the scarce language can be effectively improved.
Description
Technical Field
The embodiment of the invention relates to the technical field of machine translation, in particular to a machine translation method, a machine translation device, electronic equipment and a storage medium.
Background
Machine translation is a technology for translating a sentence to be translated input by a user by using a machine to obtain a target sentence. Machine translation based on neural networks is currently the best method of machine translation. According to the method, the neural network model parameters are automatically fitted by training the neural network, so that the mapping from the source language to the target language is realized. However, in the process of machine translation, the translation of the scarce corpus has always been a great challenge. Especially, when translating in a whisper language, the machine translation method based on the neural network is difficult to perform parameter fitting. For example, a plurality of single Chinese language materials and single Arabic language materials exist, but the Chinese language materials are relatively few, and the problem that the network is difficult to train can be solved through a large number of single language materials, so that the machine translation method based on the neural network is helped to be applied to the translation of the small languages.
In the existing machine translation method, a translation back method is usually adopted to perform plain language translation, that is, a translation model from a target language to a source language is adopted to generate a source language sample corresponding to the target language, and the translation sample is added into the translation model from the source language to the target language, so that the machine translation effect of rare languages is improved. However, the number of translation samples generated by the translation method is small, and the machine translation effect of the scarce language cannot be effectively improved.
Disclosure of Invention
In view of this, embodiments of the present invention provide a machine translation method, an apparatus, an electronic device, and a storage medium, which can generate a large number of translation samples, so as to effectively improve the machine translation effect of rare languages.
In a first aspect, an embodiment of the present invention provides a machine translation method, where the method includes:
translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation;
acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1;
and determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages.
In the foregoing embodiment, the obtaining of M target language similar candidate translations corresponding to the initial candidate translation of the target language includes:
determining sentence expression vectors of the target language corresponding to the initial candidate translations of the target language;
and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In the above embodiment, the translating the initial candidate translation in the target language into the similar texts to be translated in the N source languages corresponding to the initial candidate translation in the target language includes:
determining sentence expression vectors of the source language corresponding to the initial candidate translation of the target language;
and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In the above embodiment, the determining, according to the M target language similar candidate translations and the N source language similar texts to be translated, a target language target candidate translation corresponding to the initial text to be translated of the source language includes:
determining an M multiplied by N group of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages;
and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In the above embodiment, the determining an M × N set of translation samples according to the M candidate translations in the target languages and the N texts to be translated that are similar in the source languages includes:
calculating the confidence degrees of the similar candidate translations of the target languages and the initial candidate translations of the target languages and the confidence degrees of the similar texts to be translated of the source languages and the initial candidate translations of the target languages;
and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
In a second aspect, an embodiment of the present invention provides a machine translation apparatus, where the apparatus includes: the device comprises a translation module, an acquisition module and a determination module; wherein,
the translation module is used for translating the initial text to be translated in the source language into an initial candidate translation corresponding to the initial candidate translation in the target language;
the acquisition module is used for acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module is further used for translating the initial candidate translation of the target language into a text to be translated which is similar to the corresponding N source languages; wherein N is a natural number greater than 1;
the determining module is used for determining the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language.
In the foregoing embodiment, the obtaining module is specifically configured to determine a sentence expression vector of a target language corresponding to an initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In the above embodiment, the translation module is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In the above embodiment, the determining module is specifically configured to determine an M × N set of translation samples according to the M similar candidate translations in the target languages and the N similar texts to be translated in the source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In the foregoing embodiment, the determining module is specifically configured to calculate a confidence level between the similar candidate translations in the target languages and the initial candidate translations in the target languages, and a confidence level between the similar text to be translated in each source language and the initial candidate translations in the target languages; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a machine translation method as described in any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a machine translation method according to any embodiment of the present invention.
The embodiment of the invention provides a machine translation method, a machine translation device, electronic equipment and a storage medium, wherein an initial text to be translated in a source language is translated into an initial candidate translation corresponding to the initial candidate translation in a target language; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method, the device, the electronic device and the storage medium provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of rare languages can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Drawings
Fig. 1 is a schematic flowchart of a machine translation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a machine translation method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a machine translation method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a machine translation apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant elements of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of a machine translation method according to an embodiment of the present invention, where the method may be executed by a machine translation apparatus or an electronic device, where the apparatus or the electronic device may be implemented by software and/or hardware, and the apparatus or the electronic device may be integrated in any intelligent device with a network communication function. As shown in fig. 1, the machine translation method may include the steps of:
s101, translating the initial text to be translated of the source language into an initial candidate translation of a target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated in the source language. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S102, acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may obtain M target language similar candidate translations corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and then obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
S103, translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the corresponding N source languages of similar texts to be translated; wherein N is a natural number greater than 1. Specifically, the electronic device may first determine a sentence representation vector of the source language corresponding to the initial candidate translation of the target language; and then translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
S104, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language.
In a specific embodiment of the present invention, the electronic device may determine, according to the M target language similar candidate translations and the N source language similar texts to be translated, a target language target candidate translation corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine an M × N set of translation samples according to the M target language similar candidate translations and the N source language similar texts to be translated; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translated text in a target language corresponding to the initial candidate translated text; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Example two
Fig. 2 is a flowchart illustrating a machine translation method according to a second embodiment of the present invention. As shown in fig. 2, the machine translation method may include the steps of:
s201, translating the initial text to be translated of the source language into an initial candidate translation of a target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S202, obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may obtain M target language similar candidate translations corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and then obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
S203, translating the initial candidate translation of the target language into the corresponding similar texts to be translated of the N source languages; wherein N is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the corresponding N source languages of similar texts to be translated; wherein N is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and then translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
S204, determining M × N translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
In a specific embodiment of the present invention, the electronic device may determine an M × N set of translation samples according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language. Specifically, the similar candidate translations of the target language may include: a similar candidate translation 1, a similar candidate translation 2, …, a similar candidate translation M; the similar text to be translated in the source language may include: the text to be translated is similar to the text 1, the text to be translated is similar to the text 2, … and the text N to be translated. Therefore, the electronic device may determine N sets of translation samples from the similar candidate translation 1, the similar text to be translated 2, …, and the similar text to be translated N; determining N groups of translation samples by the similar candidate translation 2, the similar text to be translated 1, the similar text to be translated 2, … and the similar text to be translated N; by analogy, M × N translation samples can be determined according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
Preferably, the electronic device may further calculate a confidence level of each of the similar candidate translations in the target language and the initial candidate translation in the target language, and a confidence level of each of the similar texts to be translated in the source language and the initial candidate translation in the target language; and then determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
S205, determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In a specific embodiment of the present invention, the electronic device may determine, according to the M × N sets of translation samples, a target candidate translation of the target language corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine, from the similar candidate translations of all the target languages, a similar candidate translation of the target language with the highest confidence level as compared with the initial candidate translation of the target language according to the confidence levels of the similar candidate translations of each target language and the initial candidate translation of the target language; determining a similar text to be translated in the source language with the maximum confidence coefficient with the initial candidate translation of the target language from the similar texts in all the source languages according to the confidence coefficients of the similar text to be translated in each source language and the initial candidate translation of the target language; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translation of the target language and the similar to-be-translated text of the source language.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translated text in a target language corresponding to the initial candidate translated text; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
EXAMPLE III
Fig. 3 is a flowchart illustrating a machine translation method according to a third embodiment of the present invention. As shown in fig. 3, the machine translation method may include the steps of:
s301, translating the initial text to be translated of the source language into an initial candidate translation of the target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S302, sentence expression vectors of the target language corresponding to the initial candidate translation of the target language are determined.
In a specific embodiment of the present invention, the electronic device may determine a sentence representation vector of the target language corresponding to the initial candidate translation of the target language. Specifically, the electronic device may divide the initial candidate translation of the target language into a plurality of segments; and then determining sentence expression vectors of the target language corresponding to the initial candidate translation of the target language according to the participles. Specifically, the sentence representation vector of the target language may include: a sentence expression vector 1 of a target language, sentence expression vectors 2 and … of the target language, and a sentence expression vector M of the target language; wherein M is a natural number greater than 1.
S303, obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In a specific embodiment of the present invention, the electronic device may obtain, according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language, M similar candidate translations of the target language corresponding to the initial candidate translation of the target language. Specifically, the electronic device may obtain a similar candidate translation 1 of the target language according to the sentence expression vector 1 of the target language; acquiring a similar candidate translation 2 of the target language according to the sentence expression vector 2 of the target language; and so on; and acquiring the similar candidate translation M of the target language according to the sentence expression vector M of the target language.
S304, sentence expression vectors of the source language corresponding to the initial candidate translations of the target language are determined.
In particular embodiments of the present invention, the electronic device may determine a sentence representation vector in the source language corresponding to the initial candidate translation in the target language. Specifically, the electronic device may divide the initial candidate translation of the target language into a plurality of segments; and then determining sentence expression vectors of the source language corresponding to the initial candidate translation of the target language according to the participles. Specifically, the sentence representation vector in the source language may include: the source language sentence expression vector 1, the source language sentence expression vectors 2 and … and the source language sentence expression vector N; wherein N is a natural number greater than 1.
S305, translating the initial candidate translation of the target language into N texts to be translated similar to the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the N texts to be translated similar to the source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language. Specifically, the electronic device can obtain a similar text 1 to be translated in the source language according to the sentence expression vector 1 in the source language; acquiring a similar text to be translated 2 of a source language according to the sentence expression vector 2 of the source language; and so on; and obtaining a similar text N to be translated in the source language according to the sentence expression vector N in the source language.
S306, determining M × N translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
In a specific embodiment of the present invention, the electronic device may determine an M × N set of translation samples according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language. Specifically, the similar candidate translations of the target language may include: a similar candidate translation 1, a similar candidate translation 2, …, a similar candidate translation M; the similar text to be translated in the source language may include: the text to be translated is similar to the text 1, the text to be translated is similar to the text 2, … and the text N to be translated. Therefore, the electronic device may determine N sets of translation samples from the similar candidate translation 1, the similar text to be translated 2, …, and the similar text to be translated N; determining N groups of translation samples by the similar candidate translation 2, the similar text to be translated 1, the similar text to be translated 2, … and the similar text to be translated N; by analogy, M × N translation samples can be determined according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
Preferably, the electronic device may further calculate a confidence level of each of the similar candidate translations in the target language and the initial candidate translation in the target language, and a confidence level of each of the similar texts to be translated in the source language and the initial candidate translation in the target language; and then determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
S307, determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In a specific embodiment of the present invention, the electronic device may determine, according to the M × N sets of translation samples, a target candidate translation of the target language corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine, from the similar candidate translations in all the target languages, a similar candidate translation in the target language which has the highest confidence with the initial candidate translation in the target language according to the confidence of the similar candidate translations in each target language with the initial candidate translation in the target language; determining a similar text to be translated in the source language with the maximum confidence coefficient with the initial candidate translation of the target language from the similar texts in all the source languages according to the confidence coefficients of the similar text to be translated in each source language and the initial candidate translation of the target language; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translation of the target language and the similar to-be-translated text of the source language.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translation corresponding to the initial candidate translation in a target language; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Example four
Fig. 4 is a schematic structural diagram of a machine translation apparatus according to a fourth embodiment of the present invention. As shown in fig. 4, a machine translation apparatus according to an embodiment of the present invention may include: a translation module 401, an acquisition module 402 and a determination module 403; wherein,
the translation module 401 is configured to translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial candidate translation;
the obtaining module 402 is configured to obtain similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module 401 is further configured to translate the initial candidate translation of the target language into N corresponding texts to be translated in the source languages; wherein N is a natural number greater than 1;
the determining module 403 is configured to determine, according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language, a target candidate translation in the target language corresponding to the initial text to be translated in the source language.
Further, the obtaining module 402 is specifically configured to determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
Further, the translation module 401 is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
Further, the determining module 403 is specifically configured to determine an M × N set of translation samples according to the M similar candidate translations in the target languages and the N similar texts to be translated in the source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
Further, the determining module 403 is specifically configured to calculate a confidence level between the similar candidate translation in each target language and the initial candidate translation in the target language, and a confidence level between the similar text to be translated in each source language and the initial candidate translation in the target language; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
The machine translation device can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to a machine translation method provided in any embodiment of the present invention.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 5 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The processing unit 16 executes various functional applications and data processing, such as implementing a machine translation method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
EXAMPLE six
The sixth embodiment of the invention provides a computer storage medium.
The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A method of machine translation, the method comprising:
translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation;
acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1;
determining an M multiplied by N group of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
2. The method according to claim 1, wherein the obtaining of M target language similar candidate translations corresponding to the initial candidate translation of the target language comprises:
determining sentence expression vectors of the target language corresponding to the initial candidate translations of the target language;
and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
3. The method of claim 1, wherein translating the initial candidate translation in the target language into the similar to-be-translated text in the N source languages corresponding thereto comprises:
determining sentence expression vectors of the source language corresponding to the initial candidate translations of the target language;
and translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
4. The method of claim 1, wherein determining the mxn set of translation samples from the M similar candidate translations in the target language and the N similar texts to be translated in the source language comprises:
calculating the confidence degrees of the similar candidate translations of the target languages and the initial candidate translations of the target languages and the confidence degrees of the similar texts to be translated of the source languages and the initial candidate translations of the target languages;
and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
5. A machine translation apparatus, the apparatus comprising: the device comprises a translation module, an acquisition module and a determination module; wherein,
the translation module is used for translating the initial text to be translated in the source language into an initial candidate translation corresponding to the initial candidate translation in the target language;
the acquisition module is used for acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module is further used for translating the initial candidate translation of the target language into a text to be translated which is similar to the corresponding N source languages; wherein N is a natural number greater than 1;
the determining module is used for determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
6. The apparatus of claim 5, wherein:
the obtaining module is specifically configured to determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
7. The apparatus of claim 5, wherein:
the translation module is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
8. The apparatus of claim 5, wherein:
the determining module is specifically configured to calculate a confidence level between the similar candidate translations in the target languages and the initial candidate translations in the target languages and a confidence level between the similar text to be translated in each source language and the initial candidate translations in the target languages; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the machine translation method of any of claims 1-4.
10. A storage medium on which a computer program is stored, the program, when executed by a processor, implementing a machine translation method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811542809.1A CN109558604B (en) | 2018-12-17 | 2018-12-17 | Machine translation method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811542809.1A CN109558604B (en) | 2018-12-17 | 2018-12-17 | Machine translation method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109558604A CN109558604A (en) | 2019-04-02 |
CN109558604B true CN109558604B (en) | 2022-06-14 |
Family
ID=65870267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811542809.1A Active CN109558604B (en) | 2018-12-17 | 2018-12-17 | Machine translation method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109558604B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175335B (en) * | 2019-05-08 | 2023-05-09 | 北京百度网讯科技有限公司 | Translation model training method and device |
CN111079449B (en) * | 2019-12-19 | 2023-04-11 | 北京百度网讯科技有限公司 | Method and device for acquiring parallel corpus data, electronic equipment and storage medium |
CN112487830B (en) * | 2020-11-09 | 2024-05-28 | 文思海辉智科科技有限公司 | Translation memory operation execution method and device, computer equipment and storage medium |
CN112633019B (en) * | 2020-12-29 | 2023-09-05 | 北京奇艺世纪科技有限公司 | Bilingual sample generation method and device, electronic equipment and storage medium |
CN112686059B (en) * | 2020-12-29 | 2024-04-16 | 中国科学技术大学 | Text translation method, device, electronic device and storage medium |
CN115797815B (en) * | 2021-09-08 | 2023-12-15 | 荣耀终端有限公司 | AR translation processing method and electronic equipment |
CN114896991B (en) * | 2022-04-26 | 2023-02-28 | 北京百度网讯科技有限公司 | Text translation method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8185375B1 (en) * | 2007-03-26 | 2012-05-22 | Google Inc. | Word alignment with bridge languages |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN106649288A (en) * | 2016-12-12 | 2017-05-10 | 北京百度网讯科技有限公司 | Translation method and device based on artificial intelligence |
CN108932231A (en) * | 2017-05-26 | 2018-12-04 | 华为技术有限公司 | Machine translation method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9465797B2 (en) * | 2012-02-23 | 2016-10-11 | Google Inc. | Translating text using a bridge language |
-
2018
- 2018-12-17 CN CN201811542809.1A patent/CN109558604B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8185375B1 (en) * | 2007-03-26 | 2012-05-22 | Google Inc. | Word alignment with bridge languages |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN106649288A (en) * | 2016-12-12 | 2017-05-10 | 北京百度网讯科技有限公司 | Translation method and device based on artificial intelligence |
CN108932231A (en) * | 2017-05-26 | 2018-12-04 | 华为技术有限公司 | Machine translation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109558604A (en) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109558604B (en) | Machine translation method and device, electronic equipment and storage medium | |
CN111090628B (en) | Data processing method and device, storage medium and electronic equipment | |
US11314946B2 (en) | Text translation method, device, and storage medium | |
US11640551B2 (en) | Method and apparatus for recommending sample data | |
CN109635305B (en) | Voice translation method and device, equipment and storage medium | |
US20190065624A1 (en) | Method and device for obtaining answer, and computer device | |
US9766868B2 (en) | Dynamic source code generation | |
CN109697292B (en) | Machine translation method, device, electronic equipment and medium | |
CN109408834B (en) | Auxiliary machine translation method, device, equipment and storage medium | |
US9619209B1 (en) | Dynamic source code generation | |
CN112580339B (en) | Model training method and device, electronic equipment and storage medium | |
CN108536686B (en) | Picture translation method, device, terminal and storage medium | |
CN111597800B (en) | Method, device, equipment and storage medium for obtaining synonyms | |
CN115310460A (en) | Machine translation quality evaluation method, device, equipment and storage medium | |
US11354504B2 (en) | Multi-lingual action identification | |
CN109657127A (en) | A kind of answer acquisition methods, device, server and storage medium | |
CN109189332A (en) | A kind of disk hanging method, device, server and storage medium | |
CN110472241B (en) | Method for generating redundancy-removed information sentence vector and related equipment | |
CN110807334A (en) | Text processing method, device, medium and computing equipment | |
CN109062973A (en) | A kind of method for digging, device, server and the storage medium of question and answer resource | |
CN115204197A (en) | Machine translation model training method, device, equipment and storage medium | |
CN114595702A (en) | Text translation model training method, text translation method and related device | |
CN109086328B (en) | Method and device for determining upper and lower position relation, server and storage medium | |
CN109036379B (en) | Speech recognition method, apparatus and storage medium | |
CN111488768B (en) | Style conversion method and device for face image, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |