[go: up one dir, main page]

CN105843802A - Corpus intervention module and method in translation - Google Patents

Corpus intervention module and method in translation Download PDF

Info

Publication number
CN105843802A
CN105843802A CN201610202189.1A CN201610202189A CN105843802A CN 105843802 A CN105843802 A CN 105843802A CN 201610202189 A CN201610202189 A CN 201610202189A CN 105843802 A CN105843802 A CN 105843802A
Authority
CN
China
Prior art keywords
corpus
translation
matching
module
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610202189.1A
Other languages
Chinese (zh)
Inventor
白晓文
陈春纬
刘庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN201610202189.1A priority Critical patent/CN105843802A/en
Publication of CN105843802A publication Critical patent/CN105843802A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种翻译中语料介入模块及方法,目的在于,实现语料检索和对比,匹配上的语料可轻松介入到翻译中,从而能够缩减翻译时间,并提高翻译中表达一致性,所采用的技术方案为:利用语料读取模块选择性读取历史语料库和为翻译活动备制的语料库;利用翻译材料读取模块打开需要翻译的材料,并对需要翻译的材料进行分句处理;语料和翻译材料检索匹配模块对读取并经过分句处理的需要翻译的材料,逐句搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义,并通过匹配语料显示模块将匹配的语料和语料的译文区别显示出来;最后通过匹配语料介入翻译模块对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现对翻译的介入。The invention discloses a module and method for intervening corpus in translation. The purpose is to realize corpus retrieval and comparison, and the matching corpus can be easily involved in translation, thereby reducing translation time and improving the consistency of expression in translation. The technical solution is: use the corpus reading module to selectively read the historical corpus and the corpus prepared for translation activities; use the translation material reading module to open the materials that need to be translated, and perform sentence processing on the materials that need to be translated; corpus and The translation material retrieval matching module searches for the maximum corpus matching sentence by sentence for the read and processed materials that need to be translated, and finally obtains the position of the matching corpus in the text and the interpretation of the corpus, and displays the matching corpus and corpus through the matching corpus display module. The difference between the translated texts is displayed; finally, the matching corpus translation module is used to copy the matched corpus translation, and paste it in the selected position in the translation, so as to realize the intervention of translation.

Description

翻译中语料介入模块及方法Intervention module and method of corpus in translation

技术领域technical field

本发明属于计算语言学和翻译技术领域,具体涉及一种翻译中语料介入模块及方法。The invention belongs to the technical field of computational linguistics and translation, and in particular relates to a corpus intervention module and method in translation.

背景技术Background technique

语料库来自拉丁词corpus,原意为“汇总”、“文集”等,复数形式为corpora或corpuses。语料库是“作品汇集,以及任何有关主题的文本总集”(OED)是“书面语或口语材料总集,为语言学分析提供基础”(OED)。语料库是“按照明确的语言学标准选择并排序的语言运用材料汇集,旨在用作语言的样本”(Sinclair,1986:185-203)。语料库是按照明确的设计标准,为某一具体目的而集成的大型文本库(Atkins and Clear,1992:1-16)。Renouf认为,语料库是“由大量收集的书面语或口语构成,并通过计算机储存和处理,用于语言学研究的文本库”(Renouf,1987:1)。Leech指出,大量收集的可机读的电子文本是概率研究方法中获得“必需的频率数据”的基础,“为获得必需的频率数据,我们必须反洗足量的自然英语(或其他语言)文本,以便基于观测频率(observed frequency)进行合乎实际的预测。The corpus comes from the Latin word corpus, which originally means "summary", "collection", etc., and the plural form is corpora or corpuses. A corpus is "a collection of works, and any textual collection on a subject" (OED) is a "collection of written or spoken material that provides a basis for linguistic analysis" (OED). A corpus is "a collection of language-use materials selected and ordered according to explicit linguistic criteria, and intended to be used as a sample of language" (Sinclair, 1986:185-203). A corpus is a large text library assembled for a specific purpose according to clear design criteria (Atkins and Clear, 1992: 1-16). Renouf believes that a corpus is "a text library composed of a large collection of written or spoken language, stored and processed by a computer, and used for linguistic research" (Renouf, 1987: 1). Leech pointed out that a large collection of machine-readable electronic texts is the basis for obtaining "required frequency data" in probabilistic research methods. "To obtain the necessary frequency data, we must backwash a sufficient amount of natural English (or other language) texts, In order to make realistic predictions based on the observed frequency.

因此,就需要可靠机读的电子文本集,即可机读的语料库”(leech,1987:2)。综上所述,语料库具有以下基本特征:Therefore, there is a need for a reliable machine-readable collection of electronic texts, that is, a machine-readable corpus” (leech, 1987: 2). In summary, a corpus has the following basic characteristics:

1)语料库的设计和建设是在系统的理论语言学原则下进行的,语料库的开发具有明确而具体的研究目标。如二十世纪六十年代初的BROWN语料库主要目的是对美国英语进行语法分析,而随后的LOB语料库基本按照BROWN语料库的设计原则收集了同年代的英国英语,目的是进行美国英语和英国英语的对比分析和语法分析。1) The design and construction of the corpus is carried out under the principles of systematic theoretical linguistics, and the development of the corpus has clear and specific research goals. For example, the main purpose of the BROWN corpus in the early 1960s was to analyze the grammar of American English, and the subsequent LOB corpus basically collected British English of the same era according to the design principles of the BROWN corpus, and the purpose was to compare American English and British English. Analysis and parsing.

2)语料库语料的构成和取样是按照明确的语言学原则并采取随机抽样方法收集语料的,而不是简单地堆积语料。所收集的语料必须是语言运用的自然语料(naturally-occurred data)。2) The composition and sampling of the corpus is to collect the corpus according to clear linguistic principles and adopt a random sampling method, rather than simply accumulating corpus. The collected corpus must be natural-occurred data of language use.

3)语料库作为自然语言运用的样本,就必须具有代表性(representativeness)。Chomsky曾经批评语料库不过是试图用很小的样本代表巨量的甚至无限的实际语言材料,其结果必然存在偏差,缺乏代表性,“自然语料库攒在如此严重的偏差,以至于对其所进行的描述将不过是一个词表而已”(Chomsky,1962:159)。这种批评对任何以概率统计为基础手段的研究都是有价值的(McEnery,1996:5)。3) As a sample of natural language use, the corpus must be representative. Chomsky once criticized that the corpus is just trying to use a small sample to represent a huge amount of even unlimited actual language materials, and the result is bound to be biased and lacks representativeness. Description will be but a vocabulary" (Chomsky, 1962:159). This critique is valuable to any study based on methods of probability and statistics (McEnery, 1996: 5).

李文中认为:语料文本是一连续的文本或话语片段(running text or continuous stretches ofdiscourse),而不是鼓励的句子和词汇。在语料库研究中,对某一搜索词的语法关系、用法、以及大批的观察是通过分析提供的语境(context)进行的。Li Wenzhong believes that the corpus text is a continuous text or discourse fragment (running text or continuous stretches of discourse), rather than encouraging sentences and vocabulary. In corpus research, grammatical relationships, usage, and bulk observations of a search term are made by analyzing the context provided.

目前关于语料的研究更多为理论性的,为语料翻译学的研究服务,未涉及具体的实际应用;语料库的选择为研究性语料库,大多不是具体翻译实践中能够直接采用的语料库;具体翻译实践中,语料库如何介入翻译,或者说语料库如何形成对翻译的帮助,都没有具体提及。目前在翻译行业中,没有一种比较成熟的术语介入工具,通常为人工参考,效率较低。At present, the research on corpus is more theoretical, serving the research of corpus translation, and does not involve specific practical applications; the choice of corpus is a research corpus, and most of them are not corpora that can be directly used in specific translation practice; specific translation practice In , there is no specific mention of how the corpus intervenes in translation, or how the corpus forms a help to translation. At present, in the translation industry, there is no relatively mature terminology intervention tool, which is usually manual reference, and the efficiency is low.

发明内容Contents of the invention

为了解决现有技术中的问题,本发明提出一种翻译时能够实现语料检索和对比,匹配上的语料可轻松介入到翻译中,从而能够缩减翻译时间,并提高翻译中表达一致性的翻译中语料介入模块及方法。In order to solve the problems in the prior art, the present invention proposes a translation process that can realize corpus retrieval and comparison during translation, and the matched corpus can be easily involved in translation, thereby reducing translation time and improving the consistency of expression in translation. Corpus intervention modules and methods.

为了实现以上目的,本发明所采用的技术方案为:In order to achieve the above object, the technical solution adopted in the present invention is:

一种翻译中语料介入模块,包括:A corpus intervention module in translation, comprising:

语料读取模块:用于选择性读取历史语料库和为翻译活动备制的语料库;Corpus reading module: used to selectively read historical corpora and corpora prepared for translation activities;

翻译材料读取模块:用于打开需要翻译的材料,读取所述需要翻译的材料,并对所述需要翻译的材料进行分句处理;Translation material reading module: used to open the material that needs to be translated, read the material that needs to be translated, and process the material that needs to be translated by sentence;

语料和翻译材料检索匹配模块:用于对读取并经过分句处理的所述需要翻译的材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义;Corpus and translation material retrieval matching module: used to search for the largest corpus match sentence by sentence starting from the first word for the read and sentence-processed materials that need to be translated, and finally obtain the position and corpus of the matching corpus in the text Interpretation;

匹配语料显示模块:用于将匹配的语料和语料的译文区别显示出来;Matching corpus display module: used to display the matching corpus and the translation of the corpus;

匹配语料介入翻译模块:用于对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现对翻译的介入。Matching corpus intervention translation module: used to copy the matched corpus translation and paste it in the selected position in the translation, so as to realize the intervention of translation.

一种翻译中语料介入方法,包括以下步骤:A method for corpus intervention in translation, comprising the following steps:

1)翻译材料读取模块打开需要翻译的材料,读取需要翻译的材料,并对需要翻译的材料进行分句处理,同时语料读取模块选择性读取历史语料库和为翻译活动备制的语料库;1) The translation material reading module opens the materials that need to be translated, reads the materials that need to be translated, and performs sentence processing on the materials that need to be translated. At the same time, the corpus reading module selectively reads the historical corpus and the corpus prepared for translation activities ;

2)语料和翻译材料检索匹配模块对读取并经过分句处理的需要翻译的材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义;并通过匹配语料显示模块将匹配的语料和语料的译文区别显示出来;2) The corpus and translation material retrieval matching module searches for the largest corpus matching sentence by sentence from the first word to the read and sentence-processed materials that need to be translated, and finally obtains the position of the matching corpus in the text and the interpretation of the corpus; and Display the matching corpus and the translation of the corpus through the matching corpus display module;

3)匹配语料介入翻译模块对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现翻译中的语料介入。3) The matching corpus intervention translation module copies the matched corpus translation, and pastes it at a selected location in the translation, so as to realize the corpus intervention in translation.

所述的步骤1)中翻译材料读取模块对写字板、Word文档调用Word的Com接口获取word中的文本;对excel文档调用excel的Com接口获取excel表格中的文本。In the described step 1), the translation material reading module calls the Com interface of Word to the word board and the Word document to obtain the text in the word; the Com interface of the excel document to call excel obtains the text in the excel form.

所述的步骤1)中翻译材料读取模块根据标点符号规则,定义句子终止符,将需要翻译的材料切分为句子,遇到终止符判断为句尾。In the step 1), the translation material reading module defines a sentence terminator according to the punctuation rules, divides the material to be translated into sentences, and judges it as the end of a sentence when the terminator is encountered.

所述的翻译材料读取模块需要对英文句号判断是否为缩略词标点,词库中包含缩略词,在词库中搜索句号及句号之前单词,如能搜索到则为缩略词标点,则忽略不作为句子终止符。Described translation material reading module needs to judge whether English full stop is abbreviation punctuation, contains acronym in the thesaurus, searches full stop and the word before the full stop in the thesaurus, if can search then be acronym punctuation, is ignored as a sentence terminator.

所述的步骤1)中语料读取模块对历史语料库和为翻译活动备制的语料库中读取的语料以列表形式保存,并对语料按字母顺序排序。The corpus reading module in the step 1) saves the corpus read in the historical corpus and the corpus prepared for translation activities in list form, and sorts the corpus alphabetically.

所述的步骤2)中语料和翻译材料检索匹配模块对需要翻译的材料的匹配的具体步骤包括:Described step 2) in the corpus and the translation material retrieval matching module, the specific steps of the matching of the material that needs translation include:

2.1)取一个单词到单词组,语料列表搜索单词组;2.1) Take a word to a word group, and search the word group in the corpus list;

2.2)如果搜索到一个全匹配的语料,则保存语料的信息;继续转到步骤2.1)搜索更大的匹配;2.2) If a fully matched corpus is found, save the information of the corpus; continue to step 2.1) to search for a larger match;

2.3)如果搜索到一个子匹配,即单词组是语料的一部分,则转到步骤2.1)继续搜索;2.3) If a sub-match is found in the search, that is, the word group is a part of the corpus, then go to step 2.1) to continue searching;

2.4)如未搜索到匹配,则清空单词组,从最后一个匹配的单词组后开始转到步骤2.1),直至所有的翻译材料搜索完毕。2.4) If no match is found, clear the word group, and go to step 2.1) from the last matched word group until all translation materials are searched.

所述的步骤2)中匹配语料显示模块通过悬浮窗口或符号标注形式显示标定的匹配语料的译文,且该译文能够编辑。In the step 2), the matching corpus display module displays the translation of the calibrated matching corpus through a floating window or a symbolic label, and the translation can be edited.

与现有技术相比,本发明利用语料读取模块选择性读取历史语料库和为翻译活动备制的语料库;利用翻译材料读取模块打开需要翻译的材料,读取需要翻译的材料,并对需要翻译的材料进行分句处理;语料和翻译材料检索匹配模块对读取并经过分句处理的需要翻译的材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义,并通过匹配语料显示模块将匹配的语料和语料的译文区别显示出来;最后通过匹配语料介入翻译模块对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现对翻译的介入。翻译时能够实现语料检索和对比,匹配上的语料可轻松介入到翻译中,从而能够缩减翻译时间,并提高翻译中表达一致性。Compared with the prior art, the present invention utilizes the corpus reading module to selectively read the historical corpus and the corpus prepared for translation activities; utilizes the translation material reading module to open the material that needs to be translated, reads the material that needs to be translated, and The materials that need to be translated are processed in sentences; the corpus and translation material retrieval matching module searches for the largest corpus match sentence by sentence from the first word of the read and translated materials that need to be translated, and finally obtains the matching corpus in the text position and corpus interpretation, and display the matching corpus and the translation of the corpus through the matching corpus display module; finally, copy the matching corpus translation through the matching corpus intervention translation module, and paste it at the selected position in the translation, so as to realize the translation Intervention of translation. It can realize corpus retrieval and comparison during translation, and the matched corpus can be easily involved in translation, thus reducing translation time and improving the consistency of expression in translation.

进一步,翻译材料读取模块根据标点符号规则,定义句子终止符,将需要翻译的材料切分为句子,遇到终止符判断为句尾,对于英文句号需要判断是否为缩略词标点,词库中包含缩略词,在词库中搜索句号及句号之前单词,如能搜索到则为缩略词标点,则忽略不作为句子终止符,进一步提高了翻译材料读取模块对分句处理的准确性,提高了翻译效率。Furthermore, the translation material reading module defines sentence terminators according to the rules of punctuation marks, divides the material to be translated into sentences, and judges it as the end of a sentence when a terminator is encountered. For English periods, it needs to judge whether it is an abbreviation punctuation, thesaurus contains acronyms, search the period and the word before the period in the thesaurus, if found, it is an acronym punctuation, and it is ignored and not used as a sentence terminator, which further improves the accuracy of the sentence processing of the translation material reading module and improve the translation efficiency.

进一步,语料读取模块可选择性读取历史语料库和专为本次翻译活动备制的语料库,也可以读取为本次翻译活动备制的语料库为主,将历史语料库作为辅助参考读取,读取的语料以列表保存,并对语料按字母顺序排序,能够语料匹配搜索时的效率,从而能够缩减翻译时间。Further, the corpus reading module can selectively read the historical corpus and the corpus specially prepared for this translation activity, or mainly read the corpus prepared for this translation activity, and use the historical corpus as an auxiliary reference to read, The read corpus is saved in a list, and the corpus is sorted alphabetically, which can improve the efficiency of corpus matching and search, thereby reducing translation time.

进一步,语料和翻译材料检索匹配模块对需要翻译的材料的匹配采用最大语料匹配的原则,能够更好的对需要翻译的材料尽心语料匹配,进一步提高本发明的效率。Further, the corpus and translation material retrieval matching module adopts the principle of maximum corpus matching for the matching of the materials to be translated, which can better match the corpus of the materials to be translated, and further improve the efficiency of the present invention.

具体实施方式detailed description

下面结合具体的实施例对本发明作进一步的解释说明。The present invention will be further explained below in conjunction with specific examples.

本发明由五个模块构成:The present invention is made of five modules:

模块一:语料读取模块:可选择性读取历史语料库和专为本次翻译活动备制的语料库,也可以读取为本次翻译活动备制的语料库为主,将历史语料库作为辅助参考读取。读取的语料以列表保存,并对语料按字母顺序排序,提高语料匹配搜索时的效率;Module 1: Corpus reading module: You can selectively read the historical corpus and the corpus specially prepared for this translation activity, or you can read the corpus prepared for this translation activity mainly, and use the historical corpus as an auxiliary reference to read Pick. The read corpus is saved in a list, and the corpus is sorted alphabetically to improve the efficiency of corpus matching search;

模块二:翻译材料读取模块:打开需要翻译的材料,打开材料的同时,对材料进行分句处理。根据标点符号和规则,将英语文本切分为一个个的句子,定义句子终止符,如英文的句号、感叹号、问号等,遇到终止符判断为句尾,英文句号还需要判断是否缩略词,词库中包含缩略词,在词库中搜索句号及句号之前单词,如能搜索到则为缩略词标点,则忽略不作为句子终止符;Module 2: Translation material reading module: Open the material that needs to be translated, and process the material into sentences while opening the material. According to punctuation marks and rules, the English text is divided into sentences one by one, and sentence terminators are defined, such as English full stops, exclamation marks, question marks, etc. When encountering a terminator, it is judged as the end of a sentence, and the English full stop needs to be judged whether it is an abbreviation , the thesaurus contains acronyms, search for the period and the word before the period in the thesaurus, if it can be found, it will be an acronym punctuation, and it will be ignored and not used as a sentence terminator;

模块三:语料和翻译材料检索匹配模块:对读取并经过分句处理的翻译材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料(语料+释义);具体包括:(1)取一个单词到单词组,语料列表搜索单词组;(2)如果搜索到一个全匹配的语料,则保存语料的信息(位置+语料+释义),继续转到步骤(1)搜索更大的匹配;(3)如果搜索到一个子匹配(词组是语料的一部分),则转到步骤(1);(4)如果未搜索到匹配,则清空词组,从最后一个配词组后开始转到步骤(1),直到所有的翻译材料搜索完毕;Module 3: Retrieval and matching of corpus and translation materials Module: For the translation materials that have been read and processed sentence by sentence, search for the largest corpus match sentence by sentence starting from the first word, and finally get the position of the matching corpus in the text and the corpus (corpus + Interpretation); specifically include: (1) get a word to word group, and search the word group in the corpus list; (2) if a fully matched corpus is found, save the information of the corpus (position+corpus+paraphrase), and continue to go to Step (1) Search for a larger match; (3) If a sub-match is found (the phrase is part of the corpus), go to step (1); (4) If no match is found, the phrase is cleared, starting from the last Start to go to step (1) after a matching phrase group, until all translation materials have been searched;

模块四:匹配语料显示模块:凡是标签标注过的语料都是已经匹配上的语料,在翻译该句的时候,有多种方式进行显示:Module 4: Matching corpus display module: All tagged corpus are already matched corpus. When translating the sentence, there are many ways to display it:

1)显示方式一:匹配上的语料颜色显示(颜色可以设定,可设定两种颜色,区分为本次翻译活动备制的语料库和历史语料库中的语料),鼠标放置到该语料上时,鼠标旁出现该语料的译文的文本框,鼠标移动到该文本框上的时候,可选择复制该译文,鼠标离开该文本框,则该文本框退出;1) Display method 1: The color of the matching corpus is displayed (the color can be set, and two colors can be set to distinguish between the corpus prepared for this translation activity and the corpus in the historical corpus), when the mouse is placed on the corpus , a text box of the translation of the corpus appears next to the mouse. When the mouse moves over the text box, you can choose to copy the translation. If the mouse leaves the text box, the text box will exit;

2)显示方法二:匹配上的语料颜色显示(颜色可以设定,可设定两种颜色,区分为本次翻译活动备制的语料库和历史语料库中的语料),该语料的译文直接用设定符号标注直接显示在该语料后面;2) Display method 2: Display the color of the matching corpus (the color can be set, and two colors can be set to distinguish between the corpus prepared for this translation activity and the corpus in the historical corpus), and the translation of the corpus can be directly used in the device The specified symbols are displayed directly behind the corpus;

3)显示方法三:匹配上的语料颜色显示(颜色可以设定,可设定两种颜色,区分为本次翻译活动备制的语料库和历史语料库中的语料),该语料的译文悬浮显示在该语料上方,数据移动到该译文上的时候,可编辑该语料,例如可以复制改译文内容;3) Display method 3: The color of the matching corpus is displayed (the color can be set, and two colors can be set to distinguish between the corpus prepared for this translation activity and the corpus in the historical corpus), and the translation of the corpus is displayed in a floating position Above the corpus, when the data is moved to the translation, the corpus can be edited, for example, the content of the translation can be copied and changed;

模块五:匹配语料介入翻译模块:不同显示方式的语料译文可以通过复制,然后在翻译中选择位置粘贴,从而实现对翻译的介入。Module 5: Matching corpus to intervene in translation module: The translation of corpus with different display methods can be copied, and then pasted at a selected location in the translation, so as to achieve intervention in translation.

本发明方法完整的步骤:The complete steps of the inventive method:

在工具界面打开需要翻译的文本(格式可为Word、Excel、记事本、写字板等),文本文件直接用通用读文件模块获取文本,写字板、Word文档调用Word的Com接口获取word中的文本,excel调用excel的Com接口获取excel表格中的文本;然后点击“语料介入”(语料为历史语料或为本项目特制的语料),按照提示选择语料(语料形式列表分两栏显示,左栏为语料、右栏为释义)文件,调用语料和翻译材料检索匹配模块获得匹配的语料信息;Open the text to be translated in the tool interface (the format can be Word, Excel, Notepad, WordPad, etc.), directly use the general file reading module to obtain the text of the text file, and the WordPad and Word documents call the Com interface of Word to obtain the text in Word , excel calls the Com interface of excel to obtain the text in the excel table; then click "corpus intervention" (the corpus is a historical corpus or a corpus specially made for this project), and select the corpus according to the prompts (the list of corpus forms is displayed in two columns, and the left column is The corpus, the right column is the definition) file, call the corpus and translation material retrieval matching module to obtain the matching corpus information;

匹配显示有两种方式可选,1)为匹配上的语料直接用特殊符号,例如【】显示释义,根据语料和翻译材料检索匹配模块得到匹配语料在翻译文本中的位置,为了简化插入对语料在文本位置的影响,翻译文本从后往前插入匹配语料的释义;There are two options for matching display. 1) Use special symbols directly for the matched corpus, such as [] to display the definition, search the matching module according to the corpus and translation materials to get the position of the matching corpus in the translated text, and insert the corpus in order to simplify Influenced by the position of the text, the translation text is inserted into the paraphrase of the matching corpus from the back to the front;

2)语料上悬浮显示,鼠标移动到该语料上的时候,停留时间超过设定值(缺省是3秒),取到鼠标的位置,根据该位置取到句子,句子经过语料和翻译材料检索匹配模块,得到改句子匹配的语料并在鼠标所在的位置弹出显示的悬浮窗口显示;2) The hovering display on the corpus, when the mouse moves over the corpus, the stay time exceeds the set value (default is 3 seconds), the position of the mouse is obtained, and the sentence is obtained according to the position, and the sentence is retrieved through the corpus and translation materials The matching module obtains the matching corpus of the modified sentence and displays it in a floating window where the mouse is located;

直接复制两种方法显示的释义,粘贴到翻译位置,完成语料在翻译中的计入。Directly copy the paraphrase displayed by the two methods, paste it to the translation position, and complete the inclusion of the corpus in the translation.

本发明翻译时能够实现语料检索和对比,匹配上的语料可轻松介入到翻译中,从而能够缩减翻译时间,并提高翻译中表达一致性。The invention can realize corpus retrieval and comparison during translation, and the matched corpus can be easily involved in translation, thereby reducing translation time and improving expression consistency in translation.

Claims (8)

1.一种翻译中语料介入模块,其特征在于,包括:1. A corpus intervention module in translation, characterized in that, comprising: 语料读取模块:用于选择性读取历史语料库和为翻译活动备制的语料库;Corpus reading module: used to selectively read historical corpora and corpora prepared for translation activities; 翻译材料读取模块:用于打开需要翻译的材料,读取所述需要翻译的材料,并对所述需要翻译的材料进行分句处理;Translation material reading module: used to open the material that needs to be translated, read the material that needs to be translated, and process the material that needs to be translated by sentence; 语料和翻译材料检索匹配模块:用于对读取并经过分句处理的所述需要翻译的材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义;Corpus and translation material retrieval matching module: used to search for the largest corpus match sentence by sentence starting from the first word for the read and sentence-processed materials that need to be translated, and finally obtain the position and corpus of the matching corpus in the text Interpretation; 匹配语料显示模块:用于将匹配的语料和语料的译文区别显示出来;Matching corpus display module: used to display the matching corpus and the translation of the corpus; 匹配语料介入翻译模块:用于对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现对翻译的介入。Matching corpus intervention translation module: used to copy the matched corpus translation and paste it in the selected position in the translation, so as to realize the intervention of translation. 2.一种翻译中语料介入方法,其特征在于,包括以下步骤:2. A method for intervening corpus in translation, comprising the following steps: 1)翻译材料读取模块打开需要翻译的材料,读取需要翻译的材料,并对需要翻译的材料进行分句处理,同时语料读取模块选择性读取历史语料库和为翻译活动备制的语料库;1) The translation material reading module opens the materials that need to be translated, reads the materials that need to be translated, and performs sentence processing on the materials that need to be translated. At the same time, the corpus reading module selectively reads the historical corpus and the corpus prepared for translation activities ; 2)语料和翻译材料检索匹配模块对读取并经过分句处理的需要翻译的材料,逐句从第一个单词开始依次搜索最大语料匹配,最终得到匹配语料在文本中位置和语料释义;并通过匹配语料显示模块将匹配的语料和语料的译文区别显示出来;2) The corpus and translation material retrieval matching module searches for the largest corpus matching sentence by sentence from the first word to the read and sentence-processed materials that need to be translated, and finally obtains the position of the matching corpus in the text and the interpretation of the corpus; and Display the matching corpus and the translation of the corpus through the matching corpus display module; 3)匹配语料介入翻译模块对匹配的语料译文进行复制,并在翻译中选择位置粘贴,从而实现翻译中的语料介入。3) The matching corpus intervention translation module copies the matched corpus translation, and pastes it at a selected location in the translation, so as to realize the corpus intervention in translation. 3.根据权利要求2所述的一种翻译中语料介入方法,其特征在于,所述的步骤1)中翻译材料读取模块对写字板、Word文档调用Word的Com接口获取word中的文本;对excel文档调用excel的Com接口获取excel表格中的文本。3. the intervening method of corpus in a kind of translation according to claim 2, is characterized in that, in described step 1), translation material reading module transfers the Com interface of Word to writing board, Word document and obtains the text in word; Call the excel Com interface on the excel document to get the text in the excel table. 4.根据权利要求3所述的一种翻译中语料介入方法,其特征在于,所述的步骤1)中翻译材料读取模块根据标点符号规则,定义句子终止符,将需要翻译的材料切分为句子,遇到终止符判断为句尾。4. a kind of corpus intervening method in translation according to claim 3, is characterized in that, in described step 1), translation material reading module defines sentence terminator according to punctuation mark rule, and the material that needs translation is segmented is a sentence, and it is judged as the end of a sentence when a terminator is encountered. 5.根据权利要求4所述的一种翻译中语料介入方法,其特征在于,所述的翻译材料读取模块需要对英文句号判断是否为缩略词标点,词库中包含缩略词,在词库中搜索句号及句号之前单词,如能搜索到则为缩略词标点,则忽略不作为句子终止符。5. a kind of corpus intervening method in the translation according to claim 4, it is characterized in that, described translation material reading module needs to judge whether it is abbreviation punctuation to English period, comprises acronym in the thesaurus, in Search the thesaurus for the period and the word before the period. If it can be found, it will be an acronym punctuation, and it will be ignored and not used as a sentence terminator. 6.根据权利要求2所述的一种翻译中语料介入方法,其特征在于,所述的步骤1)中语料读取模块对历史语料库和为翻译活动备制的语料库中读取的语料以列表形式保存,并对语料按字母顺序排序。6. a kind of corpus intervention method in translation according to claim 2, is characterized in that, in described step 1), corpus reading module reads in the corpus of historical corpus and for the corpus that translation activity prepares with list Forms are saved and the corpus is sorted alphabetically. 7.根据权利要求2所述的一种翻译中语料介入方法,其特征在于,所述的步骤2)中语料和翻译材料检索匹配模块对需要翻译的材料的匹配的具体步骤包括:7. a kind of corpus intervening method in translation according to claim 2, is characterized in that, described step 2) in the corpus and translation material retrieval matching module, the concrete steps of the matching of the material that needs translation comprise: 2.1)取一个单词到单词组,语料列表搜索单词组;2.1) Take a word to a word group, and search the word group in the corpus list; 2.2)如果搜索到一个全匹配的语料,则保存语料的信息;继续转到步骤2.1)搜索更大的匹配;2.2) If a fully matched corpus is found, save the information of the corpus; continue to step 2.1) to search for a larger match; 2.3)如果搜索到一个子匹配,即单词组是语料的一部分,则转到步骤2.1)继续搜索;2.3) If a sub-match is found in the search, that is, the word group is a part of the corpus, then go to step 2.1) to continue searching; 2.4)如未搜索到匹配,则清空单词组,从最后一个匹配的单词组后开始转到步骤2.1),直至所有的翻译材料搜索完毕。2.4) If no match is found, clear the word group, and go to step 2.1) from the last matched word group until all translation materials are searched. 8.根据权利要求2所述的一种翻译中语料介入方法,其特征在于,所述的步骤2)中匹配语料显示模块通过悬浮窗口或符号标注形式显示标定的匹配语料的译文,且该译文能够编辑。8. A method for intervening corpus in translation according to claim 2, characterized in that the matching corpus display module in the step 2) displays the translation of the marked matching corpus through a floating window or a symbolic label form, and the translation able to edit.
CN201610202189.1A 2016-03-31 2016-03-31 Corpus intervention module and method in translation Pending CN105843802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610202189.1A CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610202189.1A CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Publications (1)

Publication Number Publication Date
CN105843802A true CN105843802A (en) 2016-08-10

Family

ID=56596566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610202189.1A Pending CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Country Status (1)

Country Link
CN (1) CN105843802A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683773A (en) * 2017-10-19 2019-04-26 北京国双科技有限公司 Corpus labeling method and device
CN110046261A (en) * 2019-04-22 2019-07-23 山东建筑大学 A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
CN110263149A (en) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 A kind of textual presentation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996166A (en) * 2009-08-14 2011-03-30 张龙哺 Bilingual sentence pair modular recording method and translation method and translation system thereof
CN102831109A (en) * 2012-08-08 2012-12-19 中国专利信息中心 Machine translating device based on intelligent matching and method thereof
CN105159892A (en) * 2015-08-28 2015-12-16 长安大学 Corpus extractor and corpus extraction method
CN105183723A (en) * 2015-09-17 2015-12-23 成都优译信息技术有限公司 Associating method for translation software and language material searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996166A (en) * 2009-08-14 2011-03-30 张龙哺 Bilingual sentence pair modular recording method and translation method and translation system thereof
CN102831109A (en) * 2012-08-08 2012-12-19 中国专利信息中心 Machine translating device based on intelligent matching and method thereof
CN105159892A (en) * 2015-08-28 2015-12-16 长安大学 Corpus extractor and corpus extraction method
CN105183723A (en) * 2015-09-17 2015-12-23 成都优译信息技术有限公司 Associating method for translation software and language material searching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
哈乐: "基于实例的汉阿语言机器翻译系统的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683773A (en) * 2017-10-19 2019-04-26 北京国双科技有限公司 Corpus labeling method and device
CN110046261A (en) * 2019-04-22 2019-07-23 山东建筑大学 A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
CN110263149A (en) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 A kind of textual presentation method and device

Similar Documents

Publication Publication Date Title
McEnery et al. The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study
KR101522049B1 (en) Coreference resolution in an ambiguity-sensitive natural language processing system
CN103500160B (en) A kind of syntactic analysis method based on the semantic String matching that slides
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
CN106383818A (en) Machine translation method and device
KR101266361B1 (en) Automatic translation system based on structured translation memory and automatic translating method using the same
CN107577671A (en) A kind of key phrases extraction method based on multi-feature fusion
CN105045852A (en) Full-text search engine system for teaching resources
CN108984661A (en) Entity alignment schemes and device in a kind of knowledge mapping
KR102601980B1 (en) Patent drawing reference numbers description output method, device and system therefor
CN106528536A (en) Multilingual word segmentation method based on dictionaries and grammar analysis
CN101876975A (en) The Recognition Method of Chinese Place Names
CN108804592A (en) Knowledge library searching implementation method
CN106021392A (en) News key information extraction method and system
CN102789464A (en) Natural language processing method, device and system based on semanteme recognition
CN101751420A (en) Semantics vein document searching method
CN105843802A (en) Corpus intervention module and method in translation
CN113343717A (en) Neural machine translation method based on translation memory library
CN103164398B (en) Utilize the method that Chinese dimension language translated automatically by Chinese dimension e-dictionary
CN103164397A (en) Chinese-Kazakh electronic dictionary and automatic translating Chinese- Kazakh method thereof
Lim et al. Automatic genre detection of web documents
CN103164395A (en) Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof
CN103116607B (en) A kind of text retrieval system based on the Chinese phonetic alphabet newly
CN103942188B (en) A kind of method and apparatus identifying language material language
CN109960720B (en) Information extraction method for semi-structured text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160810