CN101777043A - Word conversion method and device - Google Patents
Word conversion method and device Download PDFInfo
- Publication number
- CN101777043A CN101777043A CN200910076370A CN200910076370A CN101777043A CN 101777043 A CN101777043 A CN 101777043A CN 200910076370 A CN200910076370 A CN 200910076370A CN 200910076370 A CN200910076370 A CN 200910076370A CN 101777043 A CN101777043 A CN 101777043A
- Authority
- CN
- China
- Prior art keywords
- speech
- sentence
- language
- semantic
- semantic item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013519 translation Methods 0.000 abstract description 31
- 230000008569 process Effects 0.000 description 11
- 230000011218 segmentation Effects 0.000 description 10
- 239000000203 mixture Substances 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000008676 import Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010028916 Neologism Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002650 habitual effect Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a word conversion method, which is used for improving the accuracy of automatic translation of natural language. The method comprises the following steps: a semantic item corresponding to a work is obtained according to a word which is expressed by a first language characters; a semantic code is obtained according to the semantic item corresponding to the word or a semantic item which is expressed by a second language characters is obtained; a word which is expressed by the second language characters is obtained according to the obtained semantic code or the semantic item expressed by the second language characters; and the words expressed by the second language characters are organized in a sentence according to a syntax rule of the second language. The invention also discloses a device and a system for realizing the above method.
Description
Technical field
The present invention relates to the computing machine and the communications field, particularly relate to Word conversion method and device.
Background technology
Be accompanied by the development of infotech and the integrated trend of global network, machine translation mothod is also improved constantly, and the supplementary translation effect of translation software is more obvious.At present, just not following hundred kinds of domestic mechanical translation softwares, the translation characteristics according to these softwares roughly can be divided into three major types: dictionary translation class, Chinesizing translation class and technical translator class.
Dictionary translation class only provides searching and translating of speech, and the translation of sentence or article can't be provided.
For Chinesizing translation class, in translation process at ordinary times, the same or analogous sentence of sentence pattern also often occurs, what software adopted mostly is the translation kernel of syntax tree structure, this kernel is applicable to statement more clocklike, and, adopt machine-made interpretive scheme for protean natural language, often do not prove effective.Another kind of translation kernel based on grammatical pattern, be based on traditional grammar tree and based on a kind of broad sense syntactic structure of natural language, computing machine is translated according to these rules with " drawing inferences about other cases from one instance ", under the situation of continuous " study " new syntax, high-quality reference translation has just occurred.The basis of broad sense grammer is " sentence pattern " and " classification ", and this makes the form of expression of grammer simpler, easy-to-use, and the succession of can classifying, and has improved the technical translator level of system.Yet still there is certain limitation in this method, to natural language translation is still not accurate enough flexibly.
The technical translator class is a kind of artificial supplementary translation mode, and the staff who needs the foreign language basis assists and finishes translation, can't finish comparatively ideal translation automatically.
To sum up, the translation result realized of existing interpretation method is all undesirable.
Summary of the invention
The embodiment of the invention provides a kind of Word conversion method and device, is used to improve accuracy of automatic translation of natural language.
A kind of Word conversion method may further comprise the steps:
Speech according to the first language literal expression that obtains obtains the semantic item corresponding with speech;
Obtain the corresponding semantic code or the semantic item of second language literal expression according to the semantic item corresponding with speech;
Obtain the speech of second language literal expression according to the semantic item of semantic code that obtains or second language literal expression;
According to the grammatical representation rule of second language the phrase of second language literal expression is made into sentence.
A kind of device that is used to realize text conversion comprises:
The semantic query module is used for obtaining the semantic item corresponding with speech according to the speech of the first language literal expression that obtains;
Semantic conversion module is used for obtaining the corresponding semantic code or the semantic item of second language literal expression according to the semantic item corresponding with speech;
Module searched in speech, is used for obtaining according to the semantic item of semantic code that obtains or second language literal expression the speech of second language literal expression;
Molded tissue block is used for according to the syntax rule of second language the phrase of second language literal expression being made into sentence.
A kind of system that is used to realize text conversion comprises:
First subscriber equipment is used to obtain the sentence that first language is expressed literal; And the sentence structure code that obtains this sentence; With the form of general semantics structure the user exported in this sentence according to this sentence structure code, and/or, according to this sentence structure code the semantic item of each speech correspondence in this sentence is exported to the user, and the sentence structure code after user's affirmation is sent to the server and second subscriber equipment; Wherein, the sentence structure code is used to represent the general semantics structure, and when each structure of general semantics structure to should speech be arranged the time, the sentence structure code also comprises the semantic code corresponding with the semantic item of each speech in the sentence;
Server is used to judge the local pairing sentence structure code of sentence that first language is expressed that whether has; If have, then the sentence structure code with described sentence sends to first subscriber equipment; Otherwise the sentence of expressing according to first language obtains the sentence structure code of this sentence and sends to first subscriber equipment, and perhaps, the sentence of indicating first subscriber equipment to express according to first language obtains the sentence structure code of this sentence;
Second subscriber equipment is used for obtaining according to the semantic code of the sentence structure code that obtains the speech of second language literal expression, and according to the grammatical representation rule of second language and sentence structure code the phrase of second language literal expression is made into sentence.
The embodiment of the invention is searched the speech that has the second language of identical or close implication with it according to the speech of first language, thereby realizes the translation of first language to second language, and the text conversion result is more accurate.
Description of drawings
Fig. 1 is the main process flow diagram of embodiment of the invention Chinese words conversion method;
Fig. 2 is the detail flowchart of embodiment of the invention Chinese words conversion method;
Fig. 3 is the synoptic diagram of first language operation interface in the embodiment of the invention;
Fig. 4 is the synoptic diagram of second language operation interface in the embodiment of the invention;
The detail flowchart of Fig. 5 for carrying out Word conversion method according to user's input and indication in the embodiment of the invention;
Fig. 6 is the synoptic diagram that comprises the operation interface of semantic item input in the embodiment of the invention;
The primary structure figure of Fig. 7 for installing in the embodiment of the invention;
The detailed structure view of Fig. 8 for installing in the embodiment of the invention;
Fig. 9 is the structural drawing of system in the embodiment of the invention.
Embodiment
The inventor finds, the cardinal rule of natural language communication is the semantic convention communication, can be at first learn general knowledge and derive the general sentence semantics agreement philosophy of various natural languages, and then carry out text conversion between the multilingual according to this group law based on natural language.
Law 1: the semanteme of various natural language vocabularies and regular collocation phrase, all available language basis vocabulary carries out semantic convention (for example, Longman's English dictionary carries out semantic convention with common wordss more than 1000 to any vocabulary, regular collocation phrase); And a kind of basicvocabulary of natural language can carry out semantic convention (any vocabulary symbol of other natural languages being carried out semantic description with a kind of common wordss of natural language) to any vocabulary of other language.
Law 2: can amplify out basicvocabulary semantic convention law naturally: the semanteme of the various natural language basicvocabularies agreement (for example, " ' good ': the antonym of ' bad ' " that circulates each other from law 1." ' father ': son's father "." ' U.S. ': beautiful, beautiful, good-looking .... ".)。
Law 3: the redundant law of natural language semantic symbol: under not with reference to contextual prerequisite beyond the sentence, if damaged certain semantic symbol in the sentence (comprising vocabulary, grammer, syntactic rule character expression) does not influence reader understanding's semanteme, and the reader knows that what symbol damaged be, then this symbol belongs to the habitual redundant symbol (for example, the most of measure word in the Chinese, the indefinite article in the English, the impossible subjunctive mood display rule of realizing) of expressing.For the semantic information communication between the different natural languages, this type of redundant symbol does not need to carry out semantic convention.
Law 4: have only the general grammar concept of various natural languages to be only necessary grammar concept.Though the expression-form difference of the grammar concept of various natural languages, though both can literal translate also can free translation for natural language translation.But, no matter to literal translate or free translation, the translation syntax conversion process of any natural language translation all is the translation special grammar notion that does not have in the original text adding without foundation.Therefore, the special grammar notion of various natural languages in fact all satisfies the redundant composition decision condition of " damaged certain vocabulary, grammer, syntactic rule character expression do not influence reader understanding's semanteme, and the reader knows that what symbol damaged be ".
Law 5: any one nonstandard sentence semantics expression formula (containing lexical semantic and grammatical and semantic) all can (for example be carried out semantic convention with the sentence semantics expression formula of standard, any one regular collocation sentence, the damaged sentence of composition, uttered sentence all can carry out semantic convention with the written type of writing sentence expression formula of a standard).
On practical aspect,, might produce a kind of various natural language natural language semantic convention interpretation method general, that can ensure the semantic information Transfer Quality from above-mentioned law.
The embodiment of the invention is searched the speech of the second language literal corresponding with the semantic item of this speech by the speech of first language literal, promptly according to identical or close semantic item, is the second language literal with the first language text conversion, makes transformation result more accurate.
Semantic item in the embodiment of the invention is meant the implication explanation of speech.As, the implication 1 of speech " good ": adjective: with bad relative, excellent, smart, good, wonderful, outstanding (0001-1); Implication 2: title: friendly, friendly, friendly, harmonious, congenial (0001-2).The content of implication 1 and implication 2 indications is semantic item, and 0001-1 and 0001-2 are the semantic code of semantic item correspondence.
Referring to Fig. 1, the main method flow process of present embodiment Chinese words conversion is as follows:
Step 101: the speech according to the first language literal expression that obtains obtains the semantic item corresponding with speech.
Step 102: obtain the corresponding semantic code or the semantic item of second language literal expression according to the semantic item corresponding with speech.
Step 103: the speech that obtains the second language literal expression according to the semantic item of semantic code that finds or second language literal expression.
Step 104: the phrase of second language literal expression is made into sentence according to the grammatical representation rule of second language.
Described grammatical representation rule is meant the display rule of grammar concept.For example, English, Chinese mainly sort with vocabulary in the sentence and express sentence element (as subject, object) grammar concept, and then directly sentence element expressed in sign nominative, objective case to Russian in the vocabulary symbol.English is expressed " past perfect tense " tense notion with adding additional character+verb distortion, Chinese then use independent vocabulary symbol " .... crossed " expression " past perfect tense " tense notion.
Wherein, the corresponding relation between the speech of the semantic item of the semantic item of the speech of first language literal expression, first language literal expression, semantic code, second language literal expression and second language literal expression is as shown in table 1:
Table 1
Referring to Fig. 2, the detailed method flow process of present embodiment Chinese words conversion is as follows:
Step 201: the sentence of the first language literal expression of acquisition, this sentence is carried out word segmentation processing, obtain the speech of first language literal expression.This participle process can be finished automatically, also can realize by user's operation.If what obtain is the section or the article of first language literal expression, can earlier it be split as sentence, carry out word segmentation processing again.
Step 202: the speech according to the first language literal expression that obtains obtains the semantic item corresponding with speech.Wherein, be provided with the corresponding relation of speech and semantic item in advance.
When this speech has a plurality of semantic item, determine semantic item according to the part of speech of this speech in sentence, and/or, from a plurality of semantic item, select the highest semantic item of frequency of utilization, and upgrade the frequency of utilization of semantic item.Frequency of utilization can be the number of applications of this semantic item in the present embodiment, or the number of applications of the number of applications of this semantic item and all semantic item and ratio, or this semantic item in a period of time number of applications and the number of applications of all semantic item in this time period and ratio, also can be other parameter that can reflect frequency properties.
Step 203: the speech of the first language literal expression that obtains is corresponded in each structure of general semantics structure.
This step can be carried out before step 202, determined the part of speech of speech then according to the pairing structure of speech, determined the semantic item of speech in step 202 according to the part of speech of speech.
Step 204: search corresponding semantic code according to the semantic item that obtains.Wherein, establish the corresponding relation of semantic item and semantic code in advance.
Step 205: the speech that obtains the second language literal expression according to the semantic code that obtains.When a semantic item to should a plurality of speech be arranged the time, can therefrom select the highest speech of frequency of utilization, and the frequency of utilization of neologisms more.Simultaneously, the part of speech of speech that can be is as required selected suitable speech.
Step 206:, the phrase of second language literal expression is made into sentence according to the grammatical representation of second language rule and each speech pairing structure in the general semantics structure.
First language is example with Chinese, referring to shown in Figure 3, one section article has been shown among the 3-1, and this article can be that the user imports, and also can obtain from mail, instant messaging or webpage etc.And, article shown in the 3-1 has carried out word segmentation processing, this word segmentation processing can be automatic realization, also can operate and realize by the user, perhaps after realizing automatically, word segmentation result is confirmed by the user, when thinking that word segmentation result is incorrect, the user can by revise among the 3-1 " " revise word segmentation result.A word " Nv Wangbixia requires the following report of Wo Lang Du " through word segmentation processing has been shown among the 3-2, the other small size word of each speech be the semantic item of this speech correspondence, is " (part of speech) is to queen's honorific title " as the semantic item (being implication) of " empress ".Can when obtaining semantic item, the speech that obtains be corresponded in each structure of the general semantics structure shown in the 3-3.The general semantics structure comprises subject district, predicate district and object district in the present embodiment, and each district also comprises modified region, core space and additional area.Wherein uncertain compositions such as conjunction between sentence or special syntax structural word are corresponded to the additional area in subject district.Structure shown in the 3-4 is the further analysis to clause among the 3-3.Data transmission for the ease of module or equipment room, present embodiment is encoded to each structure, as among the 3-3 by from top to bottom order from left to right, the coding of each structure such as A1, A2, A3, B1, B2, B3, C1, C2 and C3, the sentence empress require I read aloud following report " in the pairing semantic code of semantic item such as the 0010-1 of each speech; 0011-1; 0012-1; 0013-1; 0014-1 and 0015-1, then this sentence is encoded to " A1 (), A2 (0010-1), A3 (), B1 (), B2 (0011-1), B3 (), C1 (), C2 (A1 (), A2 (0012-1), A3 (), B1 (), B2 (0013-1), B3 (), C1 (0014-1), C2 (0015-1), C3 ()), C3 (); ", the semantic item of multilingual identical meanings can corresponding identical semantic code.Then, carry out text conversion according to this coding, second language is example with English, referring to shown in Figure 4.Determine the semantic item of second language textual representation, i.e. content shown in the small type size among the 4-2 according to semantic code.Determine corresponding speech according to this semantic item then, i.e. content shown in the big font size among the 4-2.4-3 and 4-4 show the pairing structure of speech of acquisition, according to this structure and English Grammar, the content group battle array in the structure are become the sentence shown in the 4-1.Thereby realized Chinese to English translation, wherein the content among Fig. 4 is the English translation of content among Fig. 3.Interface shown in Figure 4 provides query function for the user, as can in 4-1, showing sentence earlier, the user can be by clicking each speech in structure that sentence among the 4-1 or the speech in the sentence inquire about this sentence, the sentence semantic item and/or sentence in the structure of each speech correspondence, Query Result shows in 4-2 and/or 4-3.The user can understand the implication of speech in sentence more accurately according to the semantic item shown in the 4-2, thereby understands sentence more accurately.
Wherein, the specific implementation process that in step 203 speech of the first language literal expression that obtains is corresponded in each structure of general semantics structure comprises: determine all possible part of speech of all speech in the sentence, and determine to have the speech of unique part of speech.As sentence " empress require I read aloud following report " in, " empress " is noun, " requirement " is verb or noun, " I " am pronoun, " read aloud " and be verb, " following " is preposition, and " report " is noun, and the speech with unique part of speech has " empress ", " I ", " reading aloud ", " following " and " report ".To the requirement of part of speech, determine the composition of each speech in sentence according to each structure in grammatical representation rule and/or the general semantics structure, preferable, the composition of speech in sentence of determining to have unique part of speech earlier determined the composition of other speech in sentence again.As " empress " is title, may be subject or object, again because " empress " appears at beginning of the sentence, after the speech " requirement " that connects the verb part of speech can be arranged, determine that then " empress " is subject, thereby determine that " requirement " is predicate, the content of " requirement " back should be object.Pronoun " I " may be subject or object, because " reading aloud ", the speech that connects after " I " is verb, then determine the clause " I read aloud below report " in " I " for subject, thereby determine that " reading aloud " is predicate, " following report " is object.
Above-mentioned is according to the set Rule of judgment of grammatical representation rule for example, determine the composition of each speech in sentence, also the part of speech of each structure in the part of speech of each speech in the sentence and the default effective sentence pattern can be mated in order, if coupling is consistent, then can determine the composition of each speech in the sentence according to effective sentence pattern that the match is successful.Part of speech as each speech in the sentence " Nv Wangbixia requires the following report of Wo Lang Du " is consistent with the part of speech coupling of effective sentence pattern " (main clause) subject-noun, (main clause) meaning-verb, (object) clause subject-pronoun, (object) clause predicate-verb, (object) clause object-noun ", matching degree is the highest in other words, thus can determine sentence " empress require I read aloud following report " in the sentence element of each speech.If sentence does not all match with all effective sentence pattern, perhaps matching degree all is lower than default degree thresholding (as 60%), and then this sentence belongs to invalid sentence pattern, is re-entered or is carried out the structure division by the user by the operation in 3-3 by the user.Structure after can dividing according to the user generates new effective sentence pattern.
But interface shown in Figure 3 is an operation interface in the present embodiment, and the user can change content wherein by this interface.As with the speech among the 3-1 with mouse drag in 3-2 or 3-3, perhaps in 3-3 directly the input speech, 3-2 and 3-1 export accordingly according to the content among the 3-3 automatically.When certain speech had a plurality of semantic item, the exportable a plurality of semantic item in the next door of speech in 3-2 were selected the semantic item determined by the user, if the user think when the semantic item of exporting among the 3-2 is all inaccurate, can also import semantic item voluntarily.Be that the text conversion flow process that has user's operating process describes in detail below.
Referring to Fig. 5, it is as follows to operate the method flow that carries out text conversion according to the user in the present embodiment:
Step 501: the sentence of the first language literal expression of acquisition, this sentence is carried out word segmentation processing, obtain the speech of first language literal expression.If what obtain is the section or the article of first language literal expression, can earlier it be split as sentence, carry out word segmentation processing again.
Step 502: the speech according to the first language literal expression that obtains obtains the semantic item corresponding with speech.
When this speech has a plurality of semantic item, determine one of them semantic item according to user's operation.Perhaps, referring to shown in Figure 6, in " semantic description " of 6-1, import corresponding semantic item, and this semantic item is presented among the 3-2 by the user.Contents such as tense according to the user is provided with in 6-1 can obtain transformation result more accurately.
Step 503:, the speech of the first language literal expression that obtains is corresponded in each structure of general semantics structure by user's drag operation.Can judge further also whether the corresponding result after dragging can satisfy effective sentence pattern part of speech matched rule,, then point out the user to operate again if do not satisfy.
Step 504: the semantic item that the semantic item that obtains is converted to the second language literal expression.This process can be referring to Fig. 1 or text conversion flow process shown in Figure 2.
Step 505: the speech that obtains the second language literal expression according to the semantic item of second language literal expression.In this step, with the semantic item of second language literal expression speech as the second language literal expression.The semantic item of for example Chinese regular collocation " double gain " is " behavior reaches two purposes ", yet may " double gain " there be corresponding English regular collocation to express, then " behavior reaches two purposes " translated into English, use the English expression of " behavior reaches two purposes " to replace " double gain " in the position that " double gain " occurs then.
Step 506:, the phrase of second language literal expression is made into sentence according to the grammatical representation of second language rule and each speech pairing structure in the general semantics structure.
After step 503, can obtain corresponding semantic code according to the semantic item that obtains.When semantic item is user when defining voluntarily, be this semantic item generative semantics code automatically according to the create-rule of semantic code, but the semantic code unique identification semantic item of generation get final product.And the corresponding relation of the semantic item of the semantic code of foundation generation and the second language literal expression after the conversion, be convenient to the second language user and directly inquire about.Wherein, the create-rule of semantic code has multiple, as the synthetic new semantic code of the semantic code der group of the semantic item correspondence of each speech in the semantic item that the user is defined voluntarily, this new semantic code is the pairing semantic code of semantic item that the user defines voluntarily.
Understood the implementation procedure of text conversion by above introduction, this method can realize by a device or by the system that multiple arrangement constitutes, below earlier the device of realizing this method is introduced.
Referring to Fig. 7, the text conversion device comprises that semantic query module 701, semantic conversion module 702, speech search module 703 and molded tissue block 704.
The speech that module 703 is used for obtaining according to the semantic item of semantic code that finds or second language literal expression the second language literal expression searched in speech.
Molded tissue block 704 is used for according to the grammatical representation rule of second language the phrase of second language literal expression being made into sentence.
Described device also comprises word-dividing mode 705, and referring to shown in Figure 8, word-dividing mode 705 is used for when having obtained the sentence of first language literal expression, and the sentence of the first language literal expression that obtains is carried out participle, obtains the speech of first language literal expression.
Described device also comprises interface module 706, is used to the user that operation interface is provided, and obtains sentence, speech and the semantic item etc. of user's input.For example, by the operation interface that provides for the user, obtain with the speech corresponding semantic item of user according to the speech input of first language literal expression; Semantic query module 701 also is used for searching the semantic item corresponding with this speech according to the speech that semantic item comprised that obtains; Molded tissue block 704 also is used for the sentence that will be organized into and is transferred to speech as the semantic item of second language literal expression and searches module; Speech is searched module 703 and is used for the speech of the semantic item of the second language literal expression that will find as the second language literal expression.
Described device also comprises memory module 707, is used to store multilingual speech, semantic item and semantic code etc. and the corresponding relation between them.
Described device also comprises structure analysis module 708, and the speech that is used for the first language literal expression that will obtain corresponds to each structure of general semantics structure.Molded tissue block 704 is used for according to the syntax rule of second language and each speech in the pairing structure of general semantics structure, and the phrase of second language literal expression is made into sentence.Concrete, molded tissue block 704 is used for according to the grammatical representation rule of second language and comprises the sentence structure code of the general semantics structure of semantic code, and the phrase of second language literal expression is made into sentence.Interface module 706 is used for obtaining the operation that the user corresponds to the speech that obtains each structure of general semantics structure by the operation interface that provides for the user.Semantic query module 701 can also be divided the result according to the structure of structure analysis module 708 and be determined the part of speech of each speech, and determines the semantic item of each speech according to the part of speech of each speech.
Each module in the described device can be arranged in a plurality of physical entities, and then these a plurality of physical entities have constituted system.
Described device can be made plug-in unit, be convenient to the translation between the realization multilingual in scenes such as instant messaging, mail and webpage.
Referring to Fig. 9, present embodiment Chinese words converting system comprises first subscriber equipment 901, server 902 and second subscriber equipment 903.
First subscriber equipment 901 is used to obtain the sentence that first language is expressed literal; And the sentence structure code that obtains this sentence; With the form of general semantics structure the user exported in this sentence according to this sentence structure code, and/or, according to this sentence structure code the semantic item of each speech correspondence in this sentence is exported to the user, and the sentence structure code after user's affirmation is sent to the server 902 and second subscriber equipment 903; Wherein, the sentence structure code is used to represent the general semantics structure, and when each structure of general semantics structure to should speech be arranged the time, the sentence structure code also comprises the semantic code corresponding with the semantic item of each speech in the sentence.
Server 902 is used to judge the local pairing sentence structure code of sentence that first language is expressed that whether has; If have, then the sentence structure code with described sentence sends to first subscriber equipment; Otherwise the sentence of expressing according to first language obtains the sentence structure code of this sentence and sends to first subscriber equipment, and perhaps, the sentence of indicating first subscriber equipment to express according to first language obtains the sentence structure code of this sentence.Server 902 can also be preserved the sentence structure code that first subscriber equipment 901 sends.
Second subscriber equipment 903 is used for obtaining according to the semantic code of the sentence structure code that obtains the speech of second language literal expression, and according to the syntax rule of second language and sentence structure code the phrase of second language literal expression is made into sentence.Second subscriber equipment 903 also is used for the operation by speech that to click the sentence that is organized into or sentence, inquires about the semantic item and/or the structure in the general semantics structure of this speech correspondence and exports to the user.
First subscriber equipment 901 and second subscriber equipment 903 can be identical equipment, just play a different role under different scenes.Information transmission between first subscriber equipment 901, server 902 and second subscriber equipment 903 can be passed through realizations such as instantaneous communication system, mailing system or webpage (WEB) system.
Be used to realize that the software of the embodiment of the invention can be stored in storage mediums such as floppy disk, hard disk, CD and flash memory.
The embodiment of the invention is searched the speech that has the second language of identical or close implication with it according to the speech of first language, thereby realizes the translation of first language to second language, and the text conversion result is more accurate.When a speech has a plurality of semantic item, can pass through frequency of utilization, part of speech or user's indication etc. and determine a semantic item the properest, the speech of the second language that obtains according to this semantic item is also more accurate.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (16)
1. a device that is used to realize text conversion is characterized in that, comprising:
The semantic query module is used for obtaining the semantic item corresponding with speech according to the speech of the first language literal expression that obtains;
Semantic conversion module is used for obtaining the corresponding semantic code or the semantic item of second language literal expression according to the semantic item corresponding with speech;
Module searched in speech, is used for obtaining according to the semantic item of semantic code that obtains or second language literal expression the speech of second language literal expression;
Molded tissue block is used for according to the grammatical representation rule of second language the phrase of second language literal expression being made into sentence.
2. device as claimed in claim 1 is characterized in that, also comprises: interface module, the operation interface by providing for the user is provided, and obtain with the speech corresponding semantic item of user according to the speech input of first language literal expression;
The semantic query module also is used for searching the semantic item corresponding with this speech according to the speech that semantic item comprised that obtains;
Molded tissue block also is used for the sentence that will be organized into and is transferred to speech as the semantic item of second language literal expression and searches module;
Speech is searched module and is used for the speech of the semantic item of the second language literal expression that will obtain as the second language literal expression.
3. device as claimed in claim 1 is characterized in that, also comprises: the structure analysis module, and the speech that is used for the first language literal expression that will obtain corresponds to each structure of general semantics structure;
Molded tissue block is used for according to the grammatical representation of second language rule and each speech in the pairing structure of general semantics structure, and the phrase of second language literal expression is made into sentence.
4. device as claimed in claim 3 is characterized in that, molded tissue block is used for according to the grammatical representation of second language rule and comprises the sentence structure code of the general semantics structure of semantic code, and the phrase of second language literal expression is made into sentence.
5. device as claimed in claim 3 is characterized in that, also comprises: interface module is used for obtaining the operation that the user corresponds to the speech that obtains each structure of general semantics structure by the operation interface that provides for the user.
6. device as claimed in claim 3, it is characterized in that, the semantic query module is used for when obtaining a plurality of semantic item corresponding with speech, a semantic item is determined in indication according to the user from a plurality of semantic item, perhaps, frequency of utilization according to semantic item is determined the highest semantic item of frequency of utilization from a plurality of semantic item, perhaps, determine a semantic item according to the part of speech of this speech from a plurality of semantic item.
7. device as claimed in claim 1 is characterized in that, also comprises: interface module is used for obtaining to click the operation of the speech of the sentence that is organized into or sentence;
The semantic item of this speech correspondence inquired about in the speech that the semantic query module also is used for the sentence that is organized into by click.
8. a Word conversion method is characterized in that, may further comprise the steps:
Speech according to the first language literal expression that obtains obtains the semantic item corresponding with speech;
Obtain the corresponding semantic code or the semantic item of second language literal expression according to the semantic item corresponding with speech;
Obtain the speech of second language literal expression according to the semantic item of semantic code that obtains or second language literal expression;
According to the syntax rule of second language the phrase of second language literal expression is made into sentence.
9. Word conversion method as claimed in claim 8, it is characterized in that, the step that obtains the semantic item corresponding with speech according to the speech of the first language literal expression that obtains comprises: by the operation interface that provides for the user, obtain with the speech corresponding semantic item of user according to the speech input of first language literal expression;
The step that obtains the semantic item of corresponding second language literal expression according to the semantic item corresponding with speech comprises:
According to the speech that semantic item comprised that obtains, search the semantic item corresponding with this speech;
Search the corresponding semantic code or the semantic item of second language literal expression according to the semantic item that finds;
Obtain the speech of second language literal expression according to the semantic item of semantic code that finds or second language literal expression;
According to the grammatical representation rule of second language the phrase of second language literal expression is made into sentence, obtains the semantic item of second language literal expression;
The step that obtains the speech of second language literal expression according to the semantic item of the second language literal expression that finds comprises: with the semantic item of the second language literal expression that the finds speech as the second language literal expression.
10. Word conversion method as claimed in claim 8, it is characterized in that, before the phrase of second language literal expression being made into sentence, the speech of the first language literal expression that obtains is corresponded in each structure of general semantics structure according to the grammatical representation rule of second language;
According to the syntax rule of second language the step that the phrase of second language literal expression is made into sentence is comprised:, the phrase of second language literal expression is made into sentence according to syntax rule and each speech pairing structure in the general semantics structure of second language.
11. Word conversion method as claimed in claim 10, it is characterized in that, grammatical representation rule and each speech pairing structure in the general semantics structure according to second language, the step that the phrase of second language literal expression is made into sentence comprises: according to the grammatical representation rule of second language with comprise the sentence structure code of the general semantics structure of semantic code, the phrase of second language literal expression is made into sentence.
12. Word conversion method as claimed in claim 10, it is characterized in that the step that the speech that obtains is corresponded in each structure of general semantics structure comprises: obtain operation in each structure that the user corresponds to the speech that obtains the general semantics structure by the operation interface that provides for the user.
13. Word conversion method as claimed in claim 8, it is characterized in that, when obtaining a plurality of semantic item corresponding with speech, a semantic item is determined in indication according to the user from a plurality of semantic item, perhaps, frequency of utilization according to semantic item is determined the highest semantic item of frequency of utilization from a plurality of semantic item, perhaps, determine a semantic item according to the part of speech of this speech from a plurality of semantic item.
14. Word conversion method as claimed in claim 8 is characterized in that, also comprises step: semantic item that this speech correspondence inquired about in sentence that is organized into by click or the speech in the sentence and/or the structure in the general semantics structure.
15. a system that is used to realize text conversion is characterized in that, comprising:
First subscriber equipment is used to obtain the sentence that first language is expressed literal; And the sentence structure code that obtains this sentence; With the form of general semantics structure the user exported in this sentence according to this sentence structure code, and/or, according to this sentence structure code the semantic item of each speech correspondence in this sentence is exported to the user, and the sentence structure code after user's affirmation is sent to the server and second subscriber equipment; Wherein, the sentence structure code is used to represent the general semantics structure, and when each structure of general semantics structure to should speech be arranged the time, the sentence structure code also comprises the semantic code corresponding with the semantic item of each speech in the sentence;
Server is used to judge the local pairing sentence structure code of sentence that first language is expressed that whether has; If have, then the sentence structure code with described sentence sends to first subscriber equipment; Otherwise the sentence of expressing according to first language obtains the sentence structure code of this sentence and sends to first subscriber equipment, and perhaps, the sentence of indicating first subscriber equipment to express according to first language obtains the sentence structure code of this sentence;
Second subscriber equipment is used for obtaining according to the semantic code of the sentence structure code that obtains the speech of second language literal expression, and according to the grammatical representation rule of second language and sentence structure code the phrase of second language literal expression is made into sentence.
16. system as claimed in claim 15, it is characterized in that, second subscriber equipment also is used for the operation by speech that to click the sentence that is organized into or sentence, inquires about the semantic item and/or the structure in the general semantics structure of the speech correspondence in this sentence and exports to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910076370A CN101777043A (en) | 2009-01-14 | 2009-01-14 | Word conversion method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910076370A CN101777043A (en) | 2009-01-14 | 2009-01-14 | Word conversion method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101777043A true CN101777043A (en) | 2010-07-14 |
Family
ID=42513509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910076370A Pending CN101777043A (en) | 2009-01-14 | 2009-01-14 | Word conversion method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101777043A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017121316A1 (en) * | 2016-01-11 | 2017-07-20 | 陈勇 | Speech converter |
CN107301172A (en) * | 2017-06-22 | 2017-10-27 | 秦男 | Data processing method and storage medium |
CN107783968A (en) * | 2017-11-23 | 2018-03-09 | 浪潮金融信息技术有限公司 | A kind of language transfer method, device, computer-readable recording medium and storage control |
CN110688840A (en) * | 2019-09-26 | 2020-01-14 | 联想(北京)有限公司 | Text conversion method and device |
CN110807140A (en) * | 2019-10-31 | 2020-02-18 | 北京金堤科技有限公司 | Effective data extraction method and device |
CN111538862A (en) * | 2020-05-15 | 2020-08-14 | 北京百度网讯科技有限公司 | Method and device for explaining video |
CN111831384A (en) * | 2020-07-20 | 2020-10-27 | 北京百度网讯科技有限公司 | Language switching method and device, equipment and storage medium |
-
2009
- 2009-01-14 CN CN200910076370A patent/CN101777043A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017121316A1 (en) * | 2016-01-11 | 2017-07-20 | 陈勇 | Speech converter |
CN107301172A (en) * | 2017-06-22 | 2017-10-27 | 秦男 | Data processing method and storage medium |
CN107783968A (en) * | 2017-11-23 | 2018-03-09 | 浪潮金融信息技术有限公司 | A kind of language transfer method, device, computer-readable recording medium and storage control |
CN110688840A (en) * | 2019-09-26 | 2020-01-14 | 联想(北京)有限公司 | Text conversion method and device |
CN110807140A (en) * | 2019-10-31 | 2020-02-18 | 北京金堤科技有限公司 | Effective data extraction method and device |
CN111538862A (en) * | 2020-05-15 | 2020-08-14 | 北京百度网讯科技有限公司 | Method and device for explaining video |
CN111831384A (en) * | 2020-07-20 | 2020-10-27 | 北京百度网讯科技有限公司 | Language switching method and device, equipment and storage medium |
CN111831384B (en) * | 2020-07-20 | 2024-01-09 | 北京百度网讯科技有限公司 | Language switching method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Daud et al. | Urdu language processing: a survey | |
Ahmad et al. | 4. 1 Corpus Linguistics and Terminology Extraction | |
Elkateb et al. | Arabic WordNet and the challenges of Arabic | |
CN101216819B (en) | Name card information Chinese to English automatic translation method based on domain ontology | |
CN101777043A (en) | Word conversion method and device | |
CN101114281A (en) | Open document isomorphism engine system | |
CN108804592A (en) | Knowledge library searching implementation method | |
CN104750820A (en) | Filtering method and device for corpuses | |
Montiel-Ponsoda et al. | Style guidelines for naming and labeling ontologies in the multilingual web | |
CN100361124C (en) | System and method for word analysis | |
Matuschek et al. | Multilingual knowledge in aligned Wiktionary and OmegaWiki for translation applications | |
Aswani et al. | A hybrid approach to align sentences and words in English-Hindi parallel corpora | |
Gugliotta et al. | Tarc: Tunisian arabish corpus first complete release | |
Langé et al. | Bricks and skeletons: some ideas for the near future of MAHT | |
Pinter et al. | NYTWIT: A dataset of novel words in the New York Times | |
CN101520778A (en) | Apparatus and method for determing parts-of-speech in chinese | |
Yadava et al. | Construction and annotation of a corpus of contemporary Nepali | |
CN103164398A (en) | Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof | |
CN103164397A (en) | Chinese-Kazakh electronic dictionary and automatic translating Chinese- Kazakh method thereof | |
Mollaei et al. | Question classification in Persian language based on conditional random fields | |
CN101196883A (en) | Internet information natural language translation general method and system | |
CN115455981B (en) | Semantic understanding method, device and equipment for multilingual sentences and storage medium | |
CN103164395A (en) | Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof | |
JP2018072979A (en) | Parallel translation sentence extraction device, parallel translation sentence extraction method and program | |
Cimiano et al. | Applying linked data principles to linking multilingual wordnets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20100714 |