Detailed Description
The semantic-based machine translation method of the present invention will be described with reference first to FIG. 1. FIG. 1 is a flow diagram of a semantic-based machine translation method according to one embodiment of the present invention. The machine translation method of the present invention, unlike the prior art, is based on semantics. For a better understanding of the present invention, the following explanations are given with respect to some of the terms and concepts related to the present invention:
in natural language, a unit expressing a meaning is called a semantic unit, e.g., "engineer".
A unit that expresses a meaning in any particular natural language (e.g., english, chinese, etc.) (i.e., a semantic unit) is referred to as a semantic unit representation of the semantic unit in the particular natural language. For example, the chinese language representation of an engineer is "engineer" and the english representation is "engineer".
The semantics of a sentence in any particular natural language, we call it a sentence meaning, e.g., "i am a student". A sentence meaning is composed of semantic units. For example, the sentence meaning "I is student" is that I learns and learnsRaw materialIsJob scale(<Who>,<What job title>) These three semantic units constitute. Wherein,<who>And<what job title>Are two parameters, each of which needs to be replaced as a semantic unit.
The parameters of the semantic unit with parameters can be replaced, i.e. substituted, with semantic units. The replaced semantic unit becomes a semantic unit substitution formula. The semantic unit substitution formula is a composite semantic unit. A sentence meaning can be written as a sentence meaning expression, i.e. a semantic unit in which all parameters are replaced, such as: is thatJob scale(I, learnRaw material)。
A sentence in a particular natural language is a representation of a sentence meaning in the particular natural language. For example, "isJob scale(I, learnRaw material) "Chinese language indicates" I am a student "and English language indicates" I am a student ".
Semantic languages are composed of all semantic units. The semantic languages are unified and independent of the specific language. A particular natural language is composed of all semantic unit representations of the particular natural language. A specific natural language can be seen as a representation of a semantic language.
Different specific natural languages can be translated into each other, and people using different specific natural languages can communicate because there is a semantic unit representation corresponding to the same semantic unit, there is a sentence corresponding to the same sentence meaning, or a group of sentences expressing the semantic unit or sentence meaning can be created.
There are about 4000 languages around the world, including english, chinese, japanese, german, etc. They are all different representations of a semantic language that can be seen as uniform.
As shown in fig. 1, according to this embodiment of the present invention, first, in step 100, a sentence of an original text is extracted. The text herein refers to the original language of the article to which the translation is to be made. In the process of machine translation, the original text to be translated is input into the computer, and can be obtained from other computers through various methods in the prior art, such as keyboard input, scanning recognition or through a network. The text of these inputs will typically be a whole or a piece of an article, and therefore one of the sentences needs to be extracted first.
Then, in step 200, the sentence is semantically analyzed according to the semantic unit expression library, so as to obtain a sentence semantic expression of the sentence. Next, at step 300, the sentence expression is expanded with a representation of the target language based on the semantic unit representation library. Finally, in step 400, the expanded sentence is output as a translation. The machine translation method according to the embodiment of the present invention will be described in detail below with reference to fig. 2 to 4.
Fig. 2A and 2B are examples of semantic unit representation libraries according to an embodiment of the present invention. A semantic unit representation library is a collection of data that records semantic representations of one or more natural languages.
FIG. 2A is a semantic unit and its contents in the semantic unit representation library of this example, including semantic unit representations in respective languages. As shown in FIG. 2A, the semantic Unit representation library includes the following fields: the semantic unit ID is used for uniquely identifying a semantic unit, can be generally represented by a serial number or other numerical values or character strings which are not repeated, and can also be a line number, so that the semantic unit ID is not required to be stored; the number and the type of the parameters are used for recording the number and the type of the parameters contained in the semantic unit; the Chinese representation of the semantic unit is used for recording the Chinese representation of the corresponding semantic unit; english representation of the semantic unit, which is used for recording Chinese representation of the corresponding semantic unit; and the Japanese expression of the semantic unit is used for recording the Japanese expression of the corresponding semantic unit.
FIG. 2B lists semantic units and their corresponding easy-to-remember various writing methods.
As can be seen from the examples of fig. 2A and 2B, the semantic unit representation library is actually a database for recording semantic units, and the representation of different languages of the semantic units is associated by using the primary key or the semantic unit ID. It should be understood that other variations of the semantic unit representation library are possible, such as: the semantic representation of one language of the semantic unit can be recorded in a single table, and then the table for recording the semantic representations of a plurality of languages is corresponding by using a main key or an external key; further, other fields such as attributes of parameters of semantic units and fields of correspondence of parameters may also be included.
FIG. 3 is a detailed flow diagram of the semantic analysis step in the semantic-based machine translation method according to one embodiment of the present invention. As shown in fig. 3, first, in step 201, a semantic unit representation and its corresponding semantic unit that match in all sentences of the original text are found from the semantic unit representation library. In the prior art, there are many matching methods, such as various search methods commonly used by artificial intelligence, such as horizontal priority and vertical priority, and this step can be implemented. Next, in step 202, unit replacement is performed on semantic unit representations without parameters or semantic unit representations with parameters already replaced in the sentence with corresponding semantic units. In step 203, it is determined whether all semantic units indicate that they have been replaced, and if not, step 202 is repeated. Since in real language syntax elements tend to be nested in multiple layers, steps 202 and 203 above need to be repeated. Until the judgment result of step 203 is yes, in step 204, a sentence meaning expression is formed.
The following describes the above semantic analysis process with specific example sentences.
Suppose the original text is a sentence "mr. chen is an engineer" in chinese. The semantic analysis process is as follows:
mr. chen is an engineer →Chen (3)Mr. isWorker's tool Engineer's work (4)→ Firstly, use Raw material (Chen) (2)Is a workerEngineer's work→Is that Now title (first of all Raw material (Chen) Gong Engineer's work )(1)
As shown above, first replace "old" and "engineer" with semantic units having semantic unit IDs of 3 and 4; then the semantic unit with the parameter represents 'Mr' replacement because the parameter 'Chen' thereof is replaced; finally, the semantic unit with the parameter is replaced to represent 'yes', because the two parameters of the semantic unitHas been replaced. Finally isNow title(first of allRaw material(Chen) GongEngineer's work) "is the sentence meaning expression of the sentence. It should be noted that in the above description we have used the writing method that the semantic unit is easy to memorize, and in fact in the computer the semantic unit is replaced by a mark symbol suitable for reading by the computer, such as the semantic unit ID, so the above sentence meaning expression can be: 1(2(3),4).
Likewise, an example of an analysis process for sentences in English and Japanese is as follows:
Mr.Chen is an Engineer→Mr. Chen(3)is an engineer(4) →
Mr.(Chen)(2)is an engineer→ IsTP(Mr.(Chen),engineer)(1) ;
tweed 12373, Ph は technician です →Displaying (Chen)さんは Exercise machine Teacherです→
さ ん (Chen)は techniqueTeacherです→ です(さ ん (Chen, Zhi) Teacher )
As can be seen from the above examples, for sentences of different languages expressing the same meaning, the final sentence meaning expressions are: 1(2(3),4).
It should be understood that there are many other ways to implement the semantic analysis described above to obtain sentence expressions.
FIG. 4 is a detailed flow diagram of the semantic expansion step in the semantic-based machine translation method according to one embodiment of the present invention. As shown in FIG. 4, first, in step 301, the sentence meaning expressions are scanned sequentially, and the first semantic unit that has not been expanded is read. Next, at step 302, a semantic unit representation of the target language is found from the semantic unit representation library and expanded according to the semantic unit representation. Then, in step 303, it is determined whether all semantic units have been expanded. If all the translation is expanded, in step 304, the translation of the target language is obtained; otherwise, steps 301 to 303 are repeatedly performed.
The following describes the semantic expansion process with reference to specific illustrative sentences.
Suppose that the sentence meaning expression to be expanded is: 1 (2), (3), 4) isNow title(first of allRaw material(Chen) GongEngineer's work)=IsTP(Mr.(Chen),engineerr) = です (さ mm)/(chen)Teacher)
The sentence meaning expression is developed on Chinese as follows:
is thatNow title(first of allRaw material(Chen) GongEngineer's work) → firstRaw material(Chen) is a workerEngineer's work→ mr. chen is the workerEngineer's work→ mr. chen is the workerEngineer's workMr. chen is an engineer
As indicated above, the sentence expressions are first scanned sequentially to find the first semantic unit "is (X)1,X2) "expand it according to semantic unit representation of Chinese, namely: the middle is ' yes ', the two parameters are ' firstRaw material(Chen) "and" GongEngineer's work"on each side of" yes ". Then, the semantic unit is firstlyRaw material(X) "is expanded according to semantic unit representation of chinese, i.e.: the semantic unit "chen" as a parameter precedes "mr". Then the semantic units are 'aged' and 'worker' in turnEngineer's work"expand according to the semantic unit expression of the Chinese, get Chinese translation finally" Chen Mi is an engineer ".
Likewise, an example of the expansion process in English and Japanese for this sentence is as follows:
IsTP(Mr.(Chen),engineer)=>Mr.(Chen)is a engineer=>Mr.Chen is an engineer
=>Mr.Chen is an engineer
てす(さん(Chen, Zhi)Teacher)=>さん(Chen) は techniqueTeacher てすTwenty-Chen 12373, Si-Pi は technicianてす=>
Chen 12373i phi は technicianてす
It will be appreciated that the specific deployment steps described above may be varied widely, for example: the semantic units without parameters or with expanded parameters can be expanded first according to the arrangement sequence of the semantic units in the sentence expression but the sequence similar to the semantic analysis, and then all the semantic units are expanded layer by layer recursively.
It can be seen from the above description that, by converting the original text into sentence meaning expressions, the machine translation method of the present invention can simultaneously complete the translation of multiple target language translations, as long as there is a semantic unit representation of the corresponding language in the semantic unit representation library.
According to another embodiment of the present invention, after the sentence of the original text is converted into the sentence meaning expression, the method further comprises the step of storing the sentence meaning expression in the storage device, and after one paragraph or all the sentences of the original text are known to be converted or stored, the sentence is expanded into the target language as required. In other words, firstly, semantic analysis is performed on the original text, the formed sentence semantic expression set is stored, and the natural language is developed into the natural language according to the required language when required.
FIG. 5 is a schematic block diagram of a semantic-based machine translation system according to one embodiment of the present invention. As shown in fig. 5, the machine translation system according to this embodiment of the present invention includes: a text memory 501 for storing the text to be translated; a semantic unit representation library 506 for recording semantic representations of two or more languages corresponding to the semantic units; a semantic analyzer 504, configured to analyze and convert a sentence in an original text into a sentence semantic expression according to a semantic unit representation of an original language of a semantic unit recorded in the semantic unit representation library 506; a sentence meaning expression memory 502 for storing the sentence meaning expression analyzed and converted by the semantic analyzer 504; a semantic expandor 505 for expanding the sentence meaning expression into a sentence of the target language according to the semantic representation of the target language recorded in the semantic unit representation library 506; and a translation output means 503 for outputting the sentence of the target language expanded by the semantic expander 505 as a translation.
Those skilled in the art will appreciate that the machine translation system described above may be a computer or other computing device having processing capabilities. The computing device should include: a processor, a memory and corresponding input-output devices. The components of the machine translation system can be implemented in hardware or software. Of course, the user may use it through the network, or may utilize it to assist the user in searching, reading, or translating information on the network.
In addition, it is well known that the object of the present invention can be achieved by providing a system or apparatus with a recording medium in which software program codes that can realize the functions of the foregoing embodiments are recorded. The program code can be read by a computer, and can cause the computer (or CPU, or MPU) in the system and apparatus to read a program stored in a recording medium and execute commands according to the program code. In this case, the program code read out from the recording medium realizes the functions of the foregoing embodiments, and the recording medium in which the program code is recorded constitutes the present invention. The recording medium for recording the program codes or the variable data such as the table may be a magnetic disk (such as a flexible disk or a hard disk), an optical disk, or any nonvolatile memory card.
The principles, features and advantages of the present invention have been described above with reference to specific embodiments thereof. It is to be understood that the invention is not limited to the particular embodiments described above, as variations and modifications may be made and equivalents may be substituted for elements thereof. The scope of the invention is only limited by the appended claims.