[go: up one dir, main page]

CN109558604B - Machine translation method and device, electronic equipment and storage medium - Google Patents

Machine translation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN109558604B
CN109558604B CN201811542809.1A CN201811542809A CN109558604B CN 109558604 B CN109558604 B CN 109558604B CN 201811542809 A CN201811542809 A CN 201811542809A CN 109558604 B CN109558604 B CN 109558604B
Authority
CN
China
Prior art keywords
translation
target
similar
target language
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811542809.1A
Other languages
Chinese (zh)
Other versions
CN109558604A (en
Inventor
张睿卿
何中军
吴华
王海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811542809.1A priority Critical patent/CN109558604B/en
Publication of CN109558604A publication Critical patent/CN109558604A/en
Application granted granted Critical
Publication of CN109558604B publication Critical patent/CN109558604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a machine translation method, a machine translation device, electronic equipment and a storage medium. The method comprises the following steps: translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation; acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1; and determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. A large number of translation samples can be generated, so that the machine translation effect of the scarce language can be effectively improved.

Description

Machine translation method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of machine translation, in particular to a machine translation method, a machine translation device, electronic equipment and a storage medium.
Background
Machine translation is a technology for translating a sentence to be translated input by a user by using a machine to obtain a target sentence. Machine translation based on neural networks is currently the best method of machine translation. According to the method, the neural network model parameters are automatically fitted by training the neural network, so that the mapping from the source language to the target language is realized. However, in the process of machine translation, the translation of the scarce corpus has always been a great challenge. Especially, when translating in a whisper language, the machine translation method based on the neural network is difficult to perform parameter fitting. For example, a plurality of single Chinese language materials and single Arabic language materials exist, but the Chinese language materials are relatively few, and the problem that the network is difficult to train can be solved through a large number of single language materials, so that the machine translation method based on the neural network is helped to be applied to the translation of the small languages.
In the existing machine translation method, a translation back method is usually adopted to perform plain language translation, that is, a translation model from a target language to a source language is adopted to generate a source language sample corresponding to the target language, and the translation sample is added into the translation model from the source language to the target language, so that the machine translation effect of rare languages is improved. However, the number of translation samples generated by the translation method is small, and the machine translation effect of the scarce language cannot be effectively improved.
Disclosure of Invention
In view of this, embodiments of the present invention provide a machine translation method, an apparatus, an electronic device, and a storage medium, which can generate a large number of translation samples, so as to effectively improve the machine translation effect of rare languages.
In a first aspect, an embodiment of the present invention provides a machine translation method, where the method includes:
translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation;
acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1;
and determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages.
In the foregoing embodiment, the obtaining of M target language similar candidate translations corresponding to the initial candidate translation of the target language includes:
determining sentence expression vectors of the target language corresponding to the initial candidate translations of the target language;
and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In the above embodiment, the translating the initial candidate translation in the target language into the similar texts to be translated in the N source languages corresponding to the initial candidate translation in the target language includes:
determining sentence expression vectors of the source language corresponding to the initial candidate translation of the target language;
and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In the above embodiment, the determining, according to the M target language similar candidate translations and the N source language similar texts to be translated, a target language target candidate translation corresponding to the initial text to be translated of the source language includes:
determining an M multiplied by N group of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages;
and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In the above embodiment, the determining an M × N set of translation samples according to the M candidate translations in the target languages and the N texts to be translated that are similar in the source languages includes:
calculating the confidence degrees of the similar candidate translations of the target languages and the initial candidate translations of the target languages and the confidence degrees of the similar texts to be translated of the source languages and the initial candidate translations of the target languages;
and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
In a second aspect, an embodiment of the present invention provides a machine translation apparatus, where the apparatus includes: the device comprises a translation module, an acquisition module and a determination module; wherein,
the translation module is used for translating the initial text to be translated in the source language into an initial candidate translation corresponding to the initial candidate translation in the target language;
the acquisition module is used for acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module is further used for translating the initial candidate translation of the target language into a text to be translated which is similar to the corresponding N source languages; wherein N is a natural number greater than 1;
the determining module is used for determining the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language.
In the foregoing embodiment, the obtaining module is specifically configured to determine a sentence expression vector of a target language corresponding to an initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In the above embodiment, the translation module is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In the above embodiment, the determining module is specifically configured to determine an M × N set of translation samples according to the M similar candidate translations in the target languages and the N similar texts to be translated in the source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In the foregoing embodiment, the determining module is specifically configured to calculate a confidence level between the similar candidate translations in the target languages and the initial candidate translations in the target languages, and a confidence level between the similar text to be translated in each source language and the initial candidate translations in the target languages; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a machine translation method as described in any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a machine translation method according to any embodiment of the present invention.
The embodiment of the invention provides a machine translation method, a machine translation device, electronic equipment and a storage medium, wherein an initial text to be translated in a source language is translated into an initial candidate translation corresponding to the initial candidate translation in a target language; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method, the device, the electronic device and the storage medium provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of rare languages can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Drawings
Fig. 1 is a schematic flowchart of a machine translation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a machine translation method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a machine translation method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a machine translation apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant elements of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of a machine translation method according to an embodiment of the present invention, where the method may be executed by a machine translation apparatus or an electronic device, where the apparatus or the electronic device may be implemented by software and/or hardware, and the apparatus or the electronic device may be integrated in any intelligent device with a network communication function. As shown in fig. 1, the machine translation method may include the steps of:
s101, translating the initial text to be translated of the source language into an initial candidate translation of a target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated in the source language. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S102, acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may obtain M target language similar candidate translations corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and then obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
S103, translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the corresponding N source languages of similar texts to be translated; wherein N is a natural number greater than 1. Specifically, the electronic device may first determine a sentence representation vector of the source language corresponding to the initial candidate translation of the target language; and then translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
S104, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language.
In a specific embodiment of the present invention, the electronic device may determine, according to the M target language similar candidate translations and the N source language similar texts to be translated, a target language target candidate translation corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine an M × N set of translation samples according to the M target language similar candidate translations and the N source language similar texts to be translated; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translated text in a target language corresponding to the initial candidate translated text; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Example two
Fig. 2 is a flowchart illustrating a machine translation method according to a second embodiment of the present invention. As shown in fig. 2, the machine translation method may include the steps of:
s201, translating the initial text to be translated of the source language into an initial candidate translation of a target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S202, obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may obtain M target language similar candidate translations corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and then obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
S203, translating the initial candidate translation of the target language into the corresponding similar texts to be translated of the N source languages; wherein N is a natural number greater than 1.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the corresponding N source languages of similar texts to be translated; wherein N is a natural number greater than 1. Specifically, the electronic device may first determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and then translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
S204, determining M × N translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
In a specific embodiment of the present invention, the electronic device may determine an M × N set of translation samples according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language. Specifically, the similar candidate translations of the target language may include: a similar candidate translation 1, a similar candidate translation 2, …, a similar candidate translation M; the similar text to be translated in the source language may include: the text to be translated is similar to the text 1, the text to be translated is similar to the text 2, … and the text N to be translated. Therefore, the electronic device may determine N sets of translation samples from the similar candidate translation 1, the similar text to be translated 2, …, and the similar text to be translated N; determining N groups of translation samples by the similar candidate translation 2, the similar text to be translated 1, the similar text to be translated 2, … and the similar text to be translated N; by analogy, M × N translation samples can be determined according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
Preferably, the electronic device may further calculate a confidence level of each of the similar candidate translations in the target language and the initial candidate translation in the target language, and a confidence level of each of the similar texts to be translated in the source language and the initial candidate translation in the target language; and then determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
S205, determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In a specific embodiment of the present invention, the electronic device may determine, according to the M × N sets of translation samples, a target candidate translation of the target language corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine, from the similar candidate translations of all the target languages, a similar candidate translation of the target language with the highest confidence level as compared with the initial candidate translation of the target language according to the confidence levels of the similar candidate translations of each target language and the initial candidate translation of the target language; determining a similar text to be translated in the source language with the maximum confidence coefficient with the initial candidate translation of the target language from the similar texts in all the source languages according to the confidence coefficients of the similar text to be translated in each source language and the initial candidate translation of the target language; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translation of the target language and the similar to-be-translated text of the source language.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translated text in a target language corresponding to the initial candidate translated text; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
EXAMPLE III
Fig. 3 is a flowchart illustrating a machine translation method according to a third embodiment of the present invention. As shown in fig. 3, the machine translation method may include the steps of:
s301, translating the initial text to be translated of the source language into an initial candidate translation of the target language corresponding to the initial candidate translation.
In a specific embodiment of the present invention, the electronic device may translate the initial text to be translated in the source language into the initial candidate translation in the target language corresponding to the initial text to be translated. Specifically, the electronic device may translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial text to be translated by using a machine translation method based on statistics; or, the electronic device may also translate the initial text to be translated in the source language into an initial candidate translation in the target language corresponding to the initial text to be translated by using an example-based machine translation method.
S302, sentence expression vectors of the target language corresponding to the initial candidate translation of the target language are determined.
In a specific embodiment of the present invention, the electronic device may determine a sentence representation vector of the target language corresponding to the initial candidate translation of the target language. Specifically, the electronic device may divide the initial candidate translation of the target language into a plurality of segments; and then determining sentence expression vectors of the target language corresponding to the initial candidate translation of the target language according to the participles. Specifically, the sentence representation vector of the target language may include: a sentence expression vector 1 of a target language, sentence expression vectors 2 and … of the target language, and a sentence expression vector M of the target language; wherein M is a natural number greater than 1.
S303, obtaining M similar candidate translations of the target language corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
In a specific embodiment of the present invention, the electronic device may obtain, according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language, M similar candidate translations of the target language corresponding to the initial candidate translation of the target language. Specifically, the electronic device may obtain a similar candidate translation 1 of the target language according to the sentence expression vector 1 of the target language; acquiring a similar candidate translation 2 of the target language according to the sentence expression vector 2 of the target language; and so on; and acquiring the similar candidate translation M of the target language according to the sentence expression vector M of the target language.
S304, sentence expression vectors of the source language corresponding to the initial candidate translations of the target language are determined.
In particular embodiments of the present invention, the electronic device may determine a sentence representation vector in the source language corresponding to the initial candidate translation in the target language. Specifically, the electronic device may divide the initial candidate translation of the target language into a plurality of segments; and then determining sentence expression vectors of the source language corresponding to the initial candidate translation of the target language according to the participles. Specifically, the sentence representation vector in the source language may include: the source language sentence expression vector 1, the source language sentence expression vectors 2 and … and the source language sentence expression vector N; wherein N is a natural number greater than 1.
S305, translating the initial candidate translation of the target language into N texts to be translated similar to the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
In a specific embodiment of the present invention, the electronic device may translate the initial candidate translation of the target language into the N texts to be translated similar to the source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language. Specifically, the electronic device can obtain a similar text 1 to be translated in the source language according to the sentence expression vector 1 in the source language; acquiring a similar text to be translated 2 of a source language according to the sentence expression vector 2 of the source language; and so on; and obtaining a similar text N to be translated in the source language according to the sentence expression vector N in the source language.
S306, determining M × N translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
In a specific embodiment of the present invention, the electronic device may determine an M × N set of translation samples according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language. Specifically, the similar candidate translations of the target language may include: a similar candidate translation 1, a similar candidate translation 2, …, a similar candidate translation M; the similar text to be translated in the source language may include: the text to be translated is similar to the text 1, the text to be translated is similar to the text 2, … and the text N to be translated. Therefore, the electronic device may determine N sets of translation samples from the similar candidate translation 1, the similar text to be translated 2, …, and the similar text to be translated N; determining N groups of translation samples by the similar candidate translation 2, the similar text to be translated 1, the similar text to be translated 2, … and the similar text to be translated N; by analogy, M × N translation samples can be determined according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages.
Preferably, the electronic device may further calculate a confidence level of each of the similar candidate translations in the target language and the initial candidate translation in the target language, and a confidence level of each of the similar texts to be translated in the source language and the initial candidate translation in the target language; and then determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
S307, determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
In a specific embodiment of the present invention, the electronic device may determine, according to the M × N sets of translation samples, a target candidate translation of the target language corresponding to the initial text to be translated of the source language. Specifically, the electronic device may determine, from the similar candidate translations in all the target languages, a similar candidate translation in the target language which has the highest confidence with the initial candidate translation in the target language according to the confidence of the similar candidate translations in each target language with the initial candidate translation in the target language; determining a similar text to be translated in the source language with the maximum confidence coefficient with the initial candidate translation of the target language from the similar texts in all the source languages according to the confidence coefficients of the similar text to be translated in each source language and the initial candidate translation of the target language; and then determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translation of the target language and the similar to-be-translated text of the source language.
The machine translation method provided by the embodiment of the invention comprises the steps of firstly translating an initial text to be translated in a source language into an initial candidate translation corresponding to the initial candidate translation in a target language; then obtaining similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; and finally, determining target candidate translations of the target language corresponding to the initial to-be-translated text of the source language according to the similar candidate translations of the M target languages and the similar to-be-translated texts of the N source languages. That is to say, in the technical solution of the present invention, the target candidate translation of the target language corresponding to the initial to-be-translated text of the source language may be determined according to the M similar candidate translations of the target language and the N similar to-be-translated texts of the source language. In the existing machine translation method, the number of translation samples generated by adopting a translation back method is small, and the machine translation effect of rare languages cannot be effectively improved. Therefore, compared with the prior art, the machine translation method provided by the embodiment of the invention can generate a large number of translation samples, so that the machine translation effect of the scarce language can be effectively improved; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.
Example four
Fig. 4 is a schematic structural diagram of a machine translation apparatus according to a fourth embodiment of the present invention. As shown in fig. 4, a machine translation apparatus according to an embodiment of the present invention may include: a translation module 401, an acquisition module 402 and a determination module 403; wherein,
the translation module 401 is configured to translate an initial text to be translated in a source language into an initial candidate translation in a target language corresponding to the initial candidate translation;
the obtaining module 402 is configured to obtain similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module 401 is further configured to translate the initial candidate translation of the target language into N corresponding texts to be translated in the source languages; wherein N is a natural number greater than 1;
the determining module 403 is configured to determine, according to the M similar candidate translations in the target language and the N similar texts to be translated in the source language, a target candidate translation in the target language corresponding to the initial text to be translated in the source language.
Further, the obtaining module 402 is specifically configured to determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
Further, the translation module 401 is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts to be translated of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
Further, the determining module 403 is specifically configured to determine an M × N set of translation samples according to the M similar candidate translations in the target languages and the N similar texts to be translated in the source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
Further, the determining module 403 is specifically configured to calculate a confidence level between the similar candidate translation in each target language and the initial candidate translation in the target language, and a confidence level between the similar text to be translated in each source language and the initial candidate translation in the target language; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
The machine translation device can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to a machine translation method provided in any embodiment of the present invention.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 5 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, such as implementing a machine translation method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
EXAMPLE six
The sixth embodiment of the invention provides a computer storage medium.
The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method of machine translation, the method comprising:
translating an initial text to be translated of a source language into an initial candidate translation of a target language corresponding to the initial candidate translation;
acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
translating the initial candidate translation of the target language into N texts to be translated corresponding to the initial candidate translation of the target language; wherein N is a natural number greater than 1;
determining an M multiplied by N group of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
2. The method according to claim 1, wherein the obtaining of M target language similar candidate translations corresponding to the initial candidate translation of the target language comprises:
determining sentence expression vectors of the target language corresponding to the initial candidate translations of the target language;
and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
3. The method of claim 1, wherein translating the initial candidate translation in the target language into the similar to-be-translated text in the N source languages corresponding thereto comprises:
determining sentence expression vectors of the source language corresponding to the initial candidate translations of the target language;
and translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
4. The method of claim 1, wherein determining the mxn set of translation samples from the M similar candidate translations in the target language and the N similar texts to be translated in the source language comprises:
calculating the confidence degrees of the similar candidate translations of the target languages and the initial candidate translations of the target languages and the confidence degrees of the similar texts to be translated of the source languages and the initial candidate translations of the target languages;
and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
5. A machine translation apparatus, the apparatus comprising: the device comprises a translation module, an acquisition module and a determination module; wherein,
the translation module is used for translating the initial text to be translated in the source language into an initial candidate translation corresponding to the initial candidate translation in the target language;
the acquisition module is used for acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language; wherein M is a natural number greater than 1;
the translation module is further used for translating the initial candidate translation of the target language into a text to be translated which is similar to the corresponding N source languages; wherein N is a natural number greater than 1;
the determining module is used for determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages and the similar texts to be translated of the N source languages; and determining a target candidate translation of the target language corresponding to the initial to-be-translated text of the source language according to the M multiplied by N groups of translation samples.
6. The apparatus of claim 5, wherein:
the obtaining module is specifically configured to determine a sentence expression vector of the target language corresponding to the initial candidate translation of the target language; and acquiring similar candidate translations of M target languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the target language corresponding to the initial candidate translation of the target language.
7. The apparatus of claim 5, wherein:
the translation module is specifically configured to determine a sentence expression vector of the source language corresponding to the initial candidate translation of the target language; and translating the initial candidate translation of the target language into the similar texts of the N source languages corresponding to the initial candidate translation of the target language according to the sentence expression vector of the source language corresponding to the initial candidate translation of the target language.
8. The apparatus of claim 5, wherein:
the determining module is specifically configured to calculate a confidence level between the similar candidate translations in the target languages and the initial candidate translations in the target languages and a confidence level between the similar text to be translated in each source language and the initial candidate translations in the target languages; and determining M multiplied by N groups of translation samples according to the similar candidate translations of the M target languages, the similar to-be-translated texts of the N source languages, the confidence degrees of the similar candidate translations of the target languages and the initial candidate translation of the target language and the confidence degrees of the similar to-be-translated texts of the source languages and the initial candidate translation of the target language.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the machine translation method of any of claims 1-4.
10. A storage medium on which a computer program is stored, the program, when executed by a processor, implementing a machine translation method according to any one of claims 1 to 4.
CN201811542809.1A 2018-12-17 2018-12-17 Machine translation method and device, electronic equipment and storage medium Active CN109558604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811542809.1A CN109558604B (en) 2018-12-17 2018-12-17 Machine translation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811542809.1A CN109558604B (en) 2018-12-17 2018-12-17 Machine translation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109558604A CN109558604A (en) 2019-04-02
CN109558604B true CN109558604B (en) 2022-06-14

Family

ID=65870267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811542809.1A Active CN109558604B (en) 2018-12-17 2018-12-17 Machine translation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109558604B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175335B (en) * 2019-05-08 2023-05-09 北京百度网讯科技有限公司 Translation model training method and device
CN111079449B (en) * 2019-12-19 2023-04-11 北京百度网讯科技有限公司 Method and device for acquiring parallel corpus data, electronic equipment and storage medium
CN112487830B (en) * 2020-11-09 2024-05-28 文思海辉智科科技有限公司 Translation memory operation execution method and device, computer equipment and storage medium
CN112633019B (en) * 2020-12-29 2023-09-05 北京奇艺世纪科技有限公司 Bilingual sample generation method and device, electronic equipment and storage medium
CN112686059B (en) * 2020-12-29 2024-04-16 中国科学技术大学 Text translation method, device, electronic device and storage medium
CN115797815B (en) * 2021-09-08 2023-12-15 荣耀终端有限公司 AR translation processing method and electronic equipment
CN114896991B (en) * 2022-04-26 2023-02-28 北京百度网讯科技有限公司 Text translation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185375B1 (en) * 2007-03-26 2012-05-22 Google Inc. Word alignment with bridge languages
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN106649288A (en) * 2016-12-12 2017-05-10 北京百度网讯科技有限公司 Translation method and device based on artificial intelligence
CN108932231A (en) * 2017-05-26 2018-12-04 华为技术有限公司 Machine translation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9465797B2 (en) * 2012-02-23 2016-10-11 Google Inc. Translating text using a bridge language

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185375B1 (en) * 2007-03-26 2012-05-22 Google Inc. Word alignment with bridge languages
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN106649288A (en) * 2016-12-12 2017-05-10 北京百度网讯科技有限公司 Translation method and device based on artificial intelligence
CN108932231A (en) * 2017-05-26 2018-12-04 华为技术有限公司 Machine translation method and device

Also Published As

Publication number Publication date
CN109558604A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN109558604B (en) Machine translation method and device, electronic equipment and storage medium
CN111090628B (en) Data processing method and device, storage medium and electronic equipment
US11314946B2 (en) Text translation method, device, and storage medium
US11640551B2 (en) Method and apparatus for recommending sample data
CN109635305B (en) Voice translation method and device, equipment and storage medium
US20190065624A1 (en) Method and device for obtaining answer, and computer device
US9766868B2 (en) Dynamic source code generation
CN109697292B (en) Machine translation method, device, electronic equipment and medium
CN109408834B (en) Auxiliary machine translation method, device, equipment and storage medium
US9619209B1 (en) Dynamic source code generation
CN112580339B (en) Model training method and device, electronic equipment and storage medium
CN108536686B (en) Picture translation method, device, terminal and storage medium
CN111597800B (en) Method, device, equipment and storage medium for obtaining synonyms
CN115310460A (en) Machine translation quality evaluation method, device, equipment and storage medium
US11354504B2 (en) Multi-lingual action identification
CN109657127A (en) A kind of answer acquisition methods, device, server and storage medium
CN109189332A (en) A kind of disk hanging method, device, server and storage medium
CN110472241B (en) Method for generating redundancy-removed information sentence vector and related equipment
CN110807334A (en) Text processing method, device, medium and computing equipment
CN109062973A (en) A kind of method for digging, device, server and the storage medium of question and answer resource
CN115204197A (en) Machine translation model training method, device, equipment and storage medium
CN114595702A (en) Text translation model training method, text translation method and related device
CN109086328B (en) Method and device for determining upper and lower position relation, server and storage medium
CN109036379B (en) Speech recognition method, apparatus and storage medium
CN111488768B (en) Style conversion method and device for face image, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant