JP2001005814A

JP2001005814A - Device and method for preparing dictionary, computer readable recording medium recording dictionary preparation program and translation device

Info

Publication number: JP2001005814A
Application number: JP11369686A
Authority: JP
Inventors: Naoko Shinozaki; 直子篠崎; Toshiyuki Okunishi; 稔幸奥西
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1999-04-23
Filing date: 1999-12-27
Publication date: 2001-01-12
Anticipated expiration: 2019-12-27
Also published as: JP3696765B2

Abstract

PROBLEM TO BE SOLVED: To provide a dictionary preparing device capable of more flexibly dealing with the request of a user and a translation device using a prepared dictionary. SOLUTION: A document mixing plural kinds of characters is inputted from an input part 1. A character string extracting part 4 extracts a character string composed of a specified kind of character strings from the input document. A dictionary information preparing part 17 prepares dictionary data by adding corresponding dictionary information such as translated word or part of speech with each extracted character string as an index word and registers these data in a data storage part 3. The dictionary data are displayed through an output part 2. When preparing the dictionary information, a dictionary is prepared by inputting information for dictionary preparation desired by the user through a user supporting part 22 as needed. Therefore, this device is provided for dictionary preparation efficient for the user and translation using the prepared dictionary.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、入力された原文
を翻訳して訳文出力するために参照される辞書データを
作成する辞書作成装置および作成された辞書データを用
いて翻訳処理する翻訳装置に関し、特に、よりユーザの
要望に柔軟に対応することのできる辞書作成装置、辞書
作成方法、翻訳装置および辞書作成プログラムを記録し
たコンピュータで読取可能な記録媒体に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a dictionary creating apparatus for creating dictionary data referred to for translating an input original sentence and outputting a translated sentence, and a translation apparatus for performing a translation process using the created dictionary data. In particular, the present invention relates to a dictionary creation device, a dictionary creation method, a translation device, and a computer-readable recording medium that records a dictionary creation program that can flexibly respond to user requests.

【０００２】[0002]

【従来の技術】近年、電子計算機を用いた翻訳装置が数
多く開発されているが、その翻訳結果はまだ人間の翻訳
レベルには及ばないのが実状である。この原因の１つと
して、日々増加する固有名詞に翻訳用の辞書データが対
応できないという問題がある。このような固有名詞は、
言語の文字列のまま翻訳結果として出力すれば十分とい
うケースも多い。また、国際化に伴い、一方の言語内
で、他方の言語の概念がそのまま使用されて、あたかも
一方言語での単語や熟語であるかのように他方言語が扱
われるという現象も見られる。こういった単語や熟語が
翻訳対象の原文に現われた場合には、直訳した訳語より
も、その単語や熟語をそのまま翻訳結果の訳語として出
力した方が、読みやすいことが多い。2. Description of the Related Art In recent years, a large number of translators using electronic computers have been developed, but the translation results have not yet reached the level of human translation. As one of the causes, there is a problem that dictionary data for translation cannot correspond to proper nouns that increase every day. Such proper nouns are
In many cases, it is sufficient to output the translation result as it is in the language character string. Also, with the internationalization, there is a phenomenon that the concept of one language is used as it is in one language, and the other language is treated as if it were a word or idiom in one language. When such a word or idiom appears in the original text to be translated, it is often easier to read if the word or idiom is output as it is as a translation of the translation result than a directly translated word.

【０００３】また、一方言語内に存在する他方言語の単
語や熟語については、翻訳前の言語と翻訳後の言語が同
じ文字の種類で構成される場合には、判別が容易ではな
いが、たとえば「アルファベット文字」を使用する英語
と「漢字、平仮名、片仮名」を使用する日本語のような
場合には文字の種類を利用して識別することが容易であ
る。[0003] Further, it is not easy to discriminate words and idioms in one language in the other language when the language before translation and the language after translation are composed of the same character type. In the case of English using "alphabetic characters" and Japanese using "kanji, hiragana, katakana", it is easy to identify using the type of character.

【０００４】このような文字の種類を利用した翻訳装置
としては、特開平８−１７１５６４号公報に開示の機械
翻訳装置がある。この公報では、原文中にアルファベッ
ト文字の未知語が存在する場合にそのアルファベットを
そのまま訳語として利用できる機械翻訳装置（特に、日
本語を他の言語に翻訳する機械翻訳装置）の実現方法が
示される。[0004] As a translation apparatus using such a character type, there is a machine translation apparatus disclosed in Japanese Patent Application Laid-Open No. Hei 8-171564. This publication discloses a method of realizing a machine translator (particularly, a machine translator that translates Japanese into another language) that can use the alphabet as a translated word when an unknown word of an alphabet character exists in the original sentence. .

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、特開平
８−１７１５６４号公報に開示されたような技術は、機
械翻訳用入力原文内の文字種類の判別のみを行なうもの
であり、判別結果に関する情報を他の処理に応用するこ
とは示されていない。However, the technique disclosed in Japanese Patent Application Laid-Open No. HEI 8-171564 only discriminates the character type in the input original text for machine translation. No application to other processes is shown.

【０００６】機械翻訳とは関係なく人手で作成済の文書
の方が利用価値が高い場合がある。たとえば、人手でｘ
ｘｙｙｙｚｚｚという英語の文章が日本語文章中に
採用されていれば、英語文字列“ｘｘｙｙｙｚｚ
ｚ”を日本語に翻訳する場合、該英語文字列のそのまま
（原語の文字列のまま）が望ましい、あるいはそれで十
分であるというユーザの意図を示しているのだから、英
日機械翻訳で翻訳する際にもそのようなユーザの意図に
よる英語の文字列を利用できる。特に、現在、多くの文
書がインターネットなどを用いて電子化された状態で入
手することが可能であるが、電子化された文書は容易に
翻訳の自動処理の対象とすることができ、少ない負荷で
単語を蓄積し、大きな効果を生むことも可能である。In some cases, a document that has been created manually has a higher utility value irrespective of machine translation. For example, manually
If the English sentence x yyy zzz is adopted in the Japanese sentence, the English character string “xx yyy zz”
When translating z ″ into Japanese, it indicates the user's intention that the English character string as it is (the original character string) is desirable or sufficient, so it is translated by English-Japanese machine translation. In such a case, an English character string according to the user's intention can be used.Especially, although many documents can be obtained in a digitized form using the Internet or the like, it has been digitized. Documents can be easily subjected to automatic translation processing, and words can be accumulated with a small load, which can produce a great effect.

【０００７】さらに、特開平８−１７１５６４号公報で
は、文書中からの文字列抽出方法の工夫にはあまり言及
していないが、実際には抽出精度の高さやユーザカスタ
マイズの自由度が翻訳の成果を左右する。場合によって
は、ユーザが抽出精度に関する情報を任意に変更するこ
とも翻訳精度を高める。Further, Japanese Patent Application Laid-Open No. Hei 8-171564 does not mention much about the method of extracting a character string from a document. However, in practice, the high extraction accuracy and the degree of freedom of user customization are the result of translation. Influences. In some cases, the user may arbitrarily change the information on the extraction accuracy to increase the translation accuracy.

【０００８】そして、文字列抽出に関し高い自由度を提
供するため、アルファベット文字列以外に第３の言語の
文字列を識別可能にすることも望まれている。たとえ
ば、日本語文書中にアラビア語文字列が出現し、同じ文
字列をそのまま翻訳結果の英語文書中に表わすことが望
まれている。[0008] In order to provide a high degree of freedom in extracting character strings, it is also desired to be able to identify character strings in a third language other than alphabetic character strings. For example, an Arabic character string appears in a Japanese document, and it is desired that the same character string be directly expressed in an English document as a translation result.

【０００９】それゆえにこの発明の目的は、よりユーザ
の要望に柔軟に対応することのできる辞書作成装置、辞
書作成方法、翻訳装置および辞書作成プログラムを記録
したコンピュータで読取可能な記録媒体を提供すること
である。[0009] Therefore, an object of the present invention is to provide a dictionary creation device, a dictionary creation method, a translation device, and a computer-readable recording medium that records a dictionary creation program, which can flexibly respond to the needs of the user. That is.

【００１０】[0010]

【課題を解決するための手段】この発明のある局面に係
る辞書作成装置は、第１種類の文字で示される第１言語
から第２種類の文字で示される第２言語に翻訳するため
に用いられて、翻訳のために用いられる既存の辞書には
未登録の見出し語が登録される辞書情報を作成する辞書
作成装置であって、複数種類の文字が混在する文書を含
む情報を入力するための入力部と、入力部から入力され
た文書から第１種類の文字からなる１つ以上の文字列を
抽出して、抽出文字列が登録された抽出情報を出力する
文字列抽出部と、文字列抽出部から出力された抽出情報
の抽出文字列を見出し語とし、該見出し語のそれぞれに
対応して該見出し語に関する情報を見出し語情報として
付与して辞書情報を作成する辞書情報作成部とを備えて
構成される。A dictionary creating apparatus according to an aspect of the present invention is used for translating a first language represented by a first type of character into a second language represented by a second type of character. A dictionary creation device that creates dictionary information in which unregistered headwords are registered in an existing dictionary used for translation, and for inputting information including a document in which a plurality of types of characters are mixed. A character string extracting unit that extracts one or more character strings of a first type of character from a document input from the input unit and outputs extraction information in which the extracted character strings are registered; A dictionary information creation unit for creating a dictionary information by using the extracted character string of the extracted information output from the column extraction unit as a headword and adding information on the headword as headword information corresponding to each of the headwords; It is comprised including.

【００１１】上述の辞書作成装置によれば、複数種類の
文字が混在する文書から第１種類の文字から成る文字列
を抽出して見出し語として登録された辞書情報が作成さ
れて、第１種類の文字で示される第１言語の文書を第２
種類の文字で示される第２言語の文書に翻訳するために
参照される既存の辞書には未登録の見出し語が登録され
た辞書情報が作成される。According to the above-described dictionary creation apparatus, a character string composed of the first type of character is extracted from a document in which a plurality of types of characters are mixed, and dictionary information registered as a headword is created. The document of the first language indicated by the character of the second
Dictionary information in which an unregistered headword is registered is created in an existing dictionary that is referred to for translation into a second language document indicated by a type of character.

【００１２】したがって、第２種類言語の文章中であっ
ても、あえて第１種類言語で記述されるような用語（新
語、専門用語）であり既存の辞書には未登録の見出し語
が登録された辞書情報を作成できて、第１種類の文字で
示される第１言語の文書を第２種類の文字で示される第
２言語の文書に翻訳する際に、当該辞書情報を参照する
ようにすればより精度の高い翻訳処理が可能となる。[0012] Therefore, even in the text of the second type language, a term (new word, technical term) that is intentionally described in the first type language, and an unregistered entry word is registered in the existing dictionary. The dictionary information can be created so as to refer to the dictionary information when translating a first language document represented by a first type of character into a second language document represented by a second type of character. If this is the case, more accurate translation processing can be performed.

【００１３】この発明の他の局面に係る辞書作成装置
は、第１種類の文字で示される第１言語から第２種類の
文字で示される第２言語に翻訳するために用いられて、
翻訳のために用いられる既存の辞書には未登録の見出し
語が登録される辞書情報を作成するものであって、複数
種類の文字が混在する文書を含む情報を入力するための
入力部と、入力部から入力された文書から第１種類の文
字からなる１つ以上の文字列を抽出して、抽出文字列が
登録された抽出情報を出力する文字列抽出部と、文字列
抽出部から出力された抽出情報の抽出文字列を見出し語
とし、該見出し語のそれぞれに対応して該見出し語に関
する情報を見出し語情報として付与して辞書情報を作成
する辞書情報作成部と、辞書情報作成部により作成され
た辞書情報の内容を、入力部を介して外部から入力され
る情報に従って所望されるように修正するための情報修
正部とを備えて構成される。A dictionary creation device according to another aspect of the present invention is used for translating a first language represented by a first type of character into a second language represented by a second type of character,
An input unit for inputting information including a document in which a plurality of types of characters are mixed, wherein the existing dictionary used for translation creates dictionary information in which unregistered headwords are registered, A character string extracting unit that extracts one or more character strings of a first type of character from a document input from the input unit and outputs extraction information in which the extracted character strings are registered; A dictionary information creating unit for creating dictionary information by using an extracted character string of the extracted information as a headword and providing information on the headword as headword information corresponding to each of the headwords; And an information correction unit for correcting the contents of the dictionary information created by the above as desired in accordance with information input from the outside via the input unit.

【００１４】上述の辞書作成装置によれば、複数種類の
文字が混在する文書から第１種類の文字から成る文字列
を抽出して見出し語として登録された辞書情報が作成さ
れて、第１種類の文字で示される第１言語の文書を第２
種類の文字で示される第２言語の文書に翻訳するために
参照される既存の辞書には未登録の見出し語が登録され
た辞書情報が作成される。また、作成された辞書情報
は、情報修正部により外部から、たとえばユーザから与
えられる情報に従いユーザが所望するように変更でき
て、容易にユーザカスタマイズされた辞書情報、または
精度の高い辞書情報を得ることができる。According to the above-described dictionary creation apparatus, a character string composed of a first type of character is extracted from a document in which a plurality of types of characters are mixed, and dictionary information registered as a headword is created. The document of the first language indicated by the character of the second
Dictionary information in which an unregistered headword is registered is created in an existing dictionary that is referred to for translation into a second language document indicated by a type of character. Also, the created dictionary information can be changed by the information correction unit from the outside according to information given by the user, for example, as desired by the user, so that user-customized dictionary information or highly accurate dictionary information can be easily obtained. be able to.

【００１５】したがって、第２種類言語の文章中であっ
ても、あえて第１種類言語で記述されるような用語（新
語、専門用語）であり既存の辞書には未登録の見出し語
が登録された辞書情報を作成してユーザカスタマイズで
きて、第１種類の文字で示される第１言語の文書を第２
種類の文字で示される第２言語の文書に翻訳する際に、
当該辞書情報を参照するようにすればより精度の高い翻
訳処理が可能となる。[0015] Therefore, even in the text of the second type language, a term (new word, technical term) that is intentionally described in the first type language, and an unregistered entry word is registered in the existing dictionary. The user can customize the dictionary information and create a second language document indicated by the first type of characters.
When translating into a second language document represented by
By referring to the dictionary information, more accurate translation processing can be performed.

【００１６】上述の辞書作成装置においては、情報修正
部が、辞書情報の内容を入力部を介して所望されるよう
に修正する。したがって、ユーザは入力部を介して辞書
情報の内容を所望するように直接に修正できて辞書情報
をユーザカスタマイズできる。In the above-described dictionary creation device, the information correction unit corrects the contents of the dictionary information via the input unit as desired. Therefore, the user can directly modify the contents of the dictionary information via the input unit as desired, and can customize the dictionary information.

【００１７】上述の辞書作成装置においては、文字列抽
出部が、抽出文字列が辞書情報に見出し語として登録さ
れるのに適正であるよう文字列の抽出を調整するための
調整処理部を備えて構成される。In the above-described dictionary creating apparatus, the character string extracting section includes an adjustment processing section for adjusting the extraction of the character string so that the extracted character string is appropriate to be registered as a headword in the dictionary information. It is composed.

【００１８】この辞書作成装置によれば、調整処理部に
より辞書情報に登録される見出し語は適正なものとされ
る。したがって、辞書情報の内容はより精度が高くな
る。According to this dictionary creation device, the headword registered in the dictionary information by the adjustment processing unit is appropriate. Therefore, the contents of the dictionary information have higher accuracy.

【００１９】上述の辞書作成装置においては、調整処理
部が、抽出文字列における括弧記号の前後の文字列を処
理する括弧記号処理部を備えて構成される。In the above-described dictionary creation device, the adjustment processing section is provided with a parenthesis symbol processing section for processing a character string before and after the parenthesis symbol in the extracted character string.

【００２０】この辞書作成装置によれば、抽出文字列に
括弧記号が含まれていたとしても、括弧記号の前後の文
字列を処理して見出し語となるように調整される。According to this dictionary creation device, even if an extracted character string includes a parenthesis symbol, the character string before and after the parenthesis symbol is processed so as to be adjusted to be a headword.

【００２１】したがって、含まれる括弧記号が処理され
た後の抽出文字列を見出し語として辞書情報に登録でき
て、登録される見出し語の精度、ひいては辞書情報の精
度が向上する。Therefore, the extracted character string after the included parenthesis symbols have been processed can be registered in the dictionary information as a headword, and the accuracy of the registered headword and thus the accuracy of the dictionary information can be improved.

【００２２】上述の辞書作成装置においては、調整処理
部が、抽出文字列における特定の記号の前後で抽出文字
列を分割して個別の抽出文字列とする特定記号分割処理
部を備えて構成される。In the above-described dictionary creation device, the adjustment processing unit is provided with a specific symbol division processing unit that divides an extracted character string before and after a specific symbol in the extracted character string to obtain an individual extracted character string. You.

【００２３】この辞書作成装置によれば、例えば‘／’
等の文字列を分割するような特定記号が抽出文字列に含
まれていれば、この特定記号で抽出文字列を分割して個
別の抽出文字列とされる。According to this dictionary creation device, for example, '/'
If a specific symbol that divides a character string such as, is included in the extracted character string, the extracted character string is divided by this specific symbol to be an individual extracted character string.

【００２４】したがって、このような特定記号が含まれ
たままの見出し語が辞書情報に登録されることはないか
ら辞書情報の精度は高まる。[0024] Therefore, since a headword containing such a specific symbol is not registered in the dictionary information, the accuracy of the dictionary information is improved.

【００２５】上述の辞書作成装置においては、調整処理
部が、抽出文字列における数字列を除去する数字列除去
処理部を備えて構成される。In the above-described dictionary creation device, the adjustment processing section is provided with a number string removal processing section for removing a number string in an extracted character string.

【００２６】この辞書作成装置によれば、抽出文字列の
数字列は除去される。したがって、たとえば商品のバー
ジョン情報を示すような数字列の数字だけが異なり他の
文字列が同一であるような複数の見出し語が辞書情報に
登録されて、辞書情報の容量が必要以上に大きくなるこ
とが回避される。また見出し語の汎用性も高まる。According to this dictionary creation device, the digit string of the extracted character string is removed. Therefore, for example, a plurality of headwords in which only numbers in a numeral string indicating product version information are different and other character strings are the same are registered in the dictionary information, and the capacity of the dictionary information becomes unnecessarily large. Is avoided. Also, the versatility of the headword is improved.

【００２７】上述の辞書作成装置においては、調整処理
部が、抽出文字列における文書に関するレイアウトを示
す情報に基づいて抽出文字列を分割して個別の抽出文字
列とするレイアウト情報処理部を備えて構成される。In the above-described dictionary creating apparatus, the adjustment processing unit includes a layout information processing unit that divides the extracted character string based on the information indicating the layout of the document in the extracted character string and converts the extracted character string into individual extracted character strings. Be composed.

【００２８】この辞書作成装置によれば、抽出文字列に
文書に関するレイアウトを示す情報が含まれれば、この
レイアウト情報に基づいて抽出文字列を分割して個別の
見出し語として辞書情報に登録できる。According to this dictionary creation device, if the extracted character string includes information indicating the layout of the document, the extracted character string can be divided based on the layout information and registered as individual headwords in the dictionary information.

【００２９】したがって、例えば改行コードなどのレイ
アウト情報が含まれたままで見出し語が辞書情報に登録
されることはなく辞書情報の精度は高まる。Therefore, a headword is not registered in the dictionary information while layout information such as a line feed code is included, and the accuracy of the dictionary information is increased.

【００３０】上述の辞書作成装置においては、調整処理
部が、抽出文字列における単語数が所定数以下のとき
は、該抽出文字列の抽出情報における登録を抑制する単
語数処理部を含んで構成される。In the above-described dictionary creating apparatus, the adjustment processing unit includes a word number processing unit that suppresses registration in the extraction information of the extracted character string when the number of words in the extracted character string is equal to or less than a predetermined number. Is done.

【００３１】この辞書作成装置によれば、所定数以下の
単語数からなる抽出文字列が見出し語として辞書情報に
登録されることが回避される。According to this dictionary creation device, it is possible to prevent an extracted character string having a number of words equal to or less than a predetermined number from being registered in the dictionary information as a headword.

【００３２】したがって、抽出文字列が第１種類の言語
の既存の辞書に登録されるような単語、たとえば動詞、
形容詞、代名詞などの１単語からなる場合に、当該基本
的単語が見出し語として辞書情報に登録されて、辞書情
報を用いた翻訳結果が誤ったものとなることが回避され
る。Therefore, a word whose extracted character string is registered in an existing dictionary of the first type of language, such as a verb,
In the case where the word is composed of one word such as an adjective or a pronoun, the basic word is registered in the dictionary information as a headword, thereby avoiding an incorrect translation result using the dictionary information.

【００３３】上述の辞書作成装置においては、調整処理
部が、所定数以下の単語からなる抽出文字列について該
文字列の長さが所定長さ以下のときは、抽出情報におけ
る該抽出文字列の登録を抑制する文字列長処理部を含ん
で構成される。In the above dictionary creating apparatus, when the length of the extracted character string including a predetermined number of words or less is less than or equal to the predetermined length, the adjustment processing unit sets the extracted character string of the extracted information in the extraction information. It is configured to include a character string length processing unit that suppresses registration.

【００３４】この辞書作成装置によれば、所定数以下の
単語からなる抽出文字列について該文字列の長さが所定
長さ以下のときは、当該抽出文字列が見出し語として辞
書情報に登録されることはない。According to this dictionary creation device, when the length of the extracted character string including a predetermined number of words or less is equal to or less than the predetermined length, the extracted character string is registered as a headword in the dictionary information. Never.

【００３５】したがって、所定数以下の単語からなる抽
出文字列であっても、該抽出文字列が所定長さ以下でな
いとき、例えば接尾語または接頭語が付与された長い文
字列の単語からなる場合には、当該抽出文字列は見出し
語として辞書情報に登録される。一方、所定長さ以下で
あるときは、第１種類の言語による基本的単語である可
能性が高いから辞書情報に見出し語として登録されな
い。それゆえに、辞書情報に登録される見出し語ならび
に辞書情報自体の精度は高まる。Therefore, even if the extracted character string is not longer than a predetermined length, even if the extracted character string is not longer than a predetermined number of words, for example, if the extracted character string is composed of a long character string word with a suffix or prefix added thereto, In, the extracted character string is registered in the dictionary information as a headword. On the other hand, when the length is shorter than the predetermined length, the word is not registered as a headword in the dictionary information because it is highly likely that the word is a basic word in the first type of language. Therefore, the accuracy of the headword registered in the dictionary information and the dictionary information itself are improved.

【００３６】上述の辞書作成装置においては、調整処理
部が、所定数以下の単語からなる抽出文字列について、
大文字と小文字が混在している場合には抽出情報におけ
る該抽出文字列の登録を維持する大小文字処理部を含ん
で構成される。In the above-described dictionary creation device, the adjustment processing unit performs the processing for the extracted character string including the predetermined number of words or less.
When uppercase and lowercase characters are mixed, the system includes a case processing unit for maintaining registration of the extracted character string in the extracted information.

【００３７】この辞書作成装置によれば、所定数以下の
単語からなる抽出文字列であっても、大文字と小文字が
混在している場合には、該抽出文字列は見出し語として
辞書情報に登録されるように処理する。According to this dictionary creation device, even if the extracted character string is composed of words of a predetermined number or less, if the uppercase and lowercase characters are mixed, the extracted character string is registered in the dictionary information as a headword. Process as follows.

【００３８】したがって、所定数以下の単語からなる抽
出文字列であっても、大文字と小文字が混在している場
合には、該抽出文字列は固有名詞などの基本的辞書を用
いて翻訳すべきではないものとして辞書情報に見出し語
として登録されるから、辞書情報の精度は向上する。Therefore, even if the extracted character string is composed of words equal to or less than a predetermined number, if uppercase and lowercase characters are mixed, the extracted character string should be translated using a basic dictionary such as proper nouns. Is registered as a headword in the dictionary information as a non-word, so that the accuracy of the dictionary information is improved.

【００３９】上述辞書作成装置においては、調整処理部
が、抽出文字列に後続する文書中の文字列に基づいて該
抽出文字列の品詞を判定して、所定品詞に該当するか否
かに従って抽出情報における該抽出文字列の登録を削除
するか否か決定する品詞処理部を含んで構成される。In the dictionary creation device, the adjustment processing unit determines the part of speech of the extracted character string based on the character string in the document subsequent to the extracted character string, and extracts the part of speech according to whether or not the part of speech corresponds to the predetermined part of speech. It is configured to include a part-of-speech processing unit that determines whether to delete the registration of the extracted character string in the information.

【００４０】この辞書作成装置によれば、品詞処理部に
より抽出文字列のうち第１種類言語の所定品詞に該当す
るものは辞書情報に見出し語として登録されないように
処理される。According to this dictionary creation device, the part-of-speech processing unit processes the extracted character string corresponding to the predetermined part of speech of the first type language so as not to be registered as a headword in the dictionary information.

【００４１】したがって、文書中で第１種類言語の所定
品詞、たとえば動詞等の既存の辞書に登録されるような
単語をそのまま示すような見出し語が辞書情報に登録さ
れることはないから、辞書情報の精度は向上する。Therefore, in the document, there is no entry in the dictionary information that directly indicates a word of the first type language, such as a verb or the like, which is registered in an existing dictionary. The accuracy of the information is improved.

【００４２】上述の辞書作成装置においては、辞書情報
を作成するのを補助するために用いられる補助データを
作成する補助データ作成部がさらに備えられて、調整処
理部は、補助データ作成部により作成された補助データ
を参照して調整処理するよう構成される。In the above-described dictionary creating apparatus, an auxiliary data creating section for creating auxiliary data used to assist in creating dictionary information is further provided, and the adjustment processing section is created by the auxiliary data creating section. The adjustment processing is performed with reference to the obtained auxiliary data.

【００４３】この辞書作成装置によれば、調整処理部は
補助データを参照して調整処理する。したがって、辞書
情報に登録される見出し語の精度は向上する。According to this dictionary creation device, the adjustment processing section performs adjustment processing with reference to the auxiliary data. Therefore, the accuracy of the headword registered in the dictionary information is improved.

【００４４】上述の辞書作成装置においては、補助デー
タ作成部が、抽出文字列のそれぞれについて文書から該
文字列が抽出された回数をカウントして頻度情報を作成
する頻度情報作成部を備えて、調整処理部は、頻度情報
作成部により作成された頻度情報に基づいて抽出回数が
所定回数未満であると判定される抽出文字列は、抽出情
報の登録から削除する頻度情報処理部を含んで構成され
る。In the above-described dictionary creation device, the auxiliary data creation unit includes a frequency information creation unit that counts the number of times that the extracted character string is extracted from the document and creates frequency information. The adjustment processing unit is configured to include a frequency information processing unit that deletes an extracted character string whose number of extractions is determined to be less than a predetermined number of times based on the frequency information created by the frequency information creation unit from the registration of the extracted information. Is done.

【００４５】この辞書作成装置によれば、頻度情報作成
部は抽出文字列のそれぞれについて文書から抽出された
回数を示す頻度情報を作成して、頻度情報処理部が、頻
度情報に基づいて抽出回数が所定回数未満であると判定
される抽出文字列は見出し語として辞書情報に登録され
ないようにする。According to this dictionary creation device, the frequency information creation unit creates frequency information indicating the number of times each character string is extracted from the document, and the frequency information processing unit determines the number of extractions based on the frequency information. The extracted character string determined to be less than the predetermined number is not registered in the dictionary information as a headword.

【００４６】したがって、文書中に出現する回数に応じ
て抽出文字列を見出し語として辞書情報に登録するか否
かを決定できるから、出現頻度の低い抽出文字列、たと
えば文書の作成者が訳語を用いるのを忘れたためにたま
たま出現しているような抽出文字列が見出し語として辞
書情報に登録されることがないように処理できるから辞
書情報の精度は向上する。Therefore, it is possible to determine whether or not to register an extracted character string in the dictionary information as a headword in accordance with the number of times that the character string appears in the document. Since it is possible to process an extracted character string that appears by accident because it has been forgotten to use it as a headword, the accuracy of the dictionary information is improved.

【００４７】上述の辞書作成装置においては、補助デー
タ作成部が、抽出文字列を辞書情報と既存の辞書を含む
辞書群から検索して検索結果を作成する辞書引き結果作
成部を備え、調整処理部は、辞書引き結果作成部により
作成された検索結果が抽出文字列は既存の辞書群に登録
済であることを示すときは、抽出情報における該抽出文
字列の登録を抑制する辞書引き処理部を備えるよう構成
される。In the above-described dictionary creation device, the auxiliary data creation unit includes a dictionary lookup result creation unit that creates a search result by searching for an extracted character string from a dictionary group including dictionary information and an existing dictionary. A dictionary lookup processing unit that suppresses registration of the extracted character string in the extracted information when the search result created by the dictionary lookup result creation unit indicates that the extracted character string is already registered in the existing dictionary group It is comprised so that it may have.

【００４８】この辞書作成装置によれば、辞書引き結果
作成部により作成された検索結果が抽出文字列は既存の
辞書群に登録済であることを示すときは、辞書引き処理
部により、当該抽出情報が見出し語として辞書情報に登
録されることが回避される。According to this dictionary creation device, when the search result created by the dictionary lookup result creation unit indicates that the extracted character string is already registered in the existing dictionary group, the dictionary lookup processing unit performs the extraction. It is avoided that the information is registered in the dictionary information as a headword.

【００４９】したがって、既存の辞書群に登録済である
ような抽出情報が見出し語として辞書情報に登録され
て、辞書情報の容量が必要以上に大きくなることを抑制
できる上述の辞書作成装置においては、補助データ作成
部が、抽出文字列を辞書群を参照して翻訳して翻訳結果
を作成する翻訳結果作成部を備えて構成される。Therefore, in the above-described dictionary creating apparatus, the extracted information which has been registered in the existing dictionary group is registered in the dictionary information as a headword, and the capacity of the dictionary information can be suppressed from becoming unnecessarily large. The auxiliary data creation unit is configured to include a translation result creation unit that creates a translation result by translating an extracted character string with reference to a dictionary group.

【００５０】この辞書作成装置によれば、辞書情報を作
成するのを補助するためのデータとして、抽出文字列の
辞書群を用いた翻訳結果が利用されるから、作成される
辞書の内容を充実させることができる。According to this dictionary creating apparatus, the translation result using the dictionary group of the extracted character strings is used as data to assist in creating the dictionary information. Can be done.

【００５１】上述の辞書作成装置においては、調整処理
部は、翻訳結果作成部により作成された翻訳結果が適当
でない場合には、抽出情報における抽出文字列の登録を
維持する第１翻訳処理部を備えるように構成される。In the above-described dictionary creation device, the adjustment processing unit includes the first translation processing unit that maintains the registration of the extracted character string in the extraction information when the translation result created by the translation result creation unit is not appropriate. It is configured to comprise.

【００５２】この辞書作成装置によれば、辞書群を参照
して抽出文字列を翻訳した結果が適当でない場合には、
当該抽出情報が見出し語として辞書情報に登録される。According to this dictionary creation device, if the result of translating the extracted character string with reference to the dictionary group is not appropriate,
The extracted information is registered in the dictionary information as a headword.

【００５３】したがって、翻訳結果が適当でないような
抽出文字列、言換えると、基本的辞書情報を用いて翻訳
した場合に翻訳結果が適正でないときは、該抽出文字列
は見出し語として辞書情報に登録されるから、その後の
翻訳時には、該辞書情報を参照することにより当該抽出
文字列の翻訳結果は適正となる。上述の辞書作成装置に
おいては、調整処理部は、翻訳結果作成部により作成さ
れた翻訳結果が該抽出文字列を示す場合には、抽出情報
における該抽出文字列の登録を抑制する第２翻訳処理部
を備えるように構成される。Therefore, an extracted character string whose translation result is not appropriate, in other words, if the translation result is not proper when translated using the basic dictionary information, the extracted character string is included in the dictionary information as a headword. Since it is registered, at the time of subsequent translation, the translation result of the extracted character string is appropriate by referring to the dictionary information. In the above-described dictionary creation device, the adjustment processing unit, when the translation result created by the translation result creation unit indicates the extracted character string, suppresses registration of the extracted character string in the extraction information. It is comprised so that a part may be provided.

【００５４】この辞書作成装置によれば、辞書群を参照
して抽出文字列を翻訳した結果が、該抽出文字列を示す
場合には、当該抽出情報が見出し語として辞書情報に登
録されるのが抑制される。According to this dictionary creation device, if the result of translating the extracted character string with reference to the dictionary group indicates the extracted character string, the extracted information is registered in the dictionary information as a headword. Is suppressed.

【００５５】したがって、翻訳結果が該抽出文字列とな
るような場合、言換えると、ある抽出文字列を辞書群を
用いて翻訳した場合に翻訳結果として該抽出文字列が得
られた場合には、該抽出文字列そのものが既に辞書群に
登録されている状態を示すことになる。したがって、こ
のような場合には、該抽出文字列は見出し語として辞書
情報に登録されるのは抑制て、該抽出文字列が見出し語
として重複登録されるのが回避される。それゆえに、辞
書の容量を適正レベルに維持して、辞書の検索効率を高
めることができる。Therefore, if the translation result is the extracted character string, in other words, if a certain extracted character string is translated using a dictionary group, and if the extracted character string is obtained as a translation result, This indicates that the extracted character string itself is already registered in the dictionary group. Therefore, in such a case, registration of the extracted character string in the dictionary information as a headword is suppressed, and duplicate registration of the extracted character string as a headword is avoided. Therefore, it is possible to maintain the dictionary capacity at an appropriate level and increase the dictionary search efficiency.

【００５６】上述の辞書作成装置においては、辞書情報
作成部が、辞書情報を補正するための辞書情報補正部を
備えて構成される。In the above-described dictionary creating apparatus, the dictionary information creating section is provided with a dictionary information correcting section for correcting dictionary information.

【００５７】この辞書作成装置によれば、作成される辞
書情報は辞書情報補正部を用いて補正される。したがっ
て、辞書情報をさらに適正な内容となるように補正でき
る。According to this dictionary creating apparatus, the created dictionary information is corrected by using the dictionary information correcting section. Therefore, the dictionary information can be corrected so as to have more appropriate contents.

【００５８】上述の辞書作成装置においては、見出し語
情報は対応する前記見出し語の訳語を含み、辞書情報補
正部は、訳語として対応する見出し語の文字を全角文字
に変換して登録する訳語全角化部を備えて構成される。In the dictionary creating apparatus described above, the headword information includes a translation of the corresponding headword, and the dictionary information correction unit converts the corresponding headword character into a full-width character as a translation and registers it. It is comprised including a conversion part.

【００５９】この辞書作成装置によれば、辞書情報補正
部の訳語全角化部は、辞書情報における見出し語情報の
訳語に、対応する見出し語の文字が全角文字に変換され
たものを登録する。したがって、翻訳時に当該辞書情報
が参照されることにより見出し語に該当する単語に訳語
として当該見出し語を第１種類言語の全角文字にしたも
のを当てることができる。According to this dictionary creation device, the translated word double-width unit of the dictionary information correction unit registers, in the translated word of the dictionary information, the translated word of the corresponding headword into full-width characters. Therefore, by referring to the dictionary information at the time of translation, it is possible to assign a word corresponding to the headword as a translation word in which the headword is converted to a double-width character of the first type language.

【００６０】上述の辞書作成装置においては、見出し語
情報は対応する見出し語の訳語を含み、辞書情報補正部
が、文書から見出し語の訳語を決定して付与する訳語決
定部を備えて構成される。In the above-described dictionary creation apparatus, the headword information includes the translation of the corresponding headword, and the dictionary information correction unit includes a translation word determination unit that determines and adds a translation of the headword from the document. You.

【００６１】この辞書作成装置によれば、辞書情報にお
いては訳語決定部により、見出し語に対応して文書の内
容から決定された訳語が付与される。したがって、文書
中に当該見出し語の訳語が併記されていた場合には、こ
れを対応する訳語として辞書情報に登録できるから、辞
書情報の精度は向上する。According to this dictionary creation device, the translated word determined by the translated word determining unit from the contents of the document is added to the dictionary information in correspondence with the headword. Therefore, when the translated word of the headword is described in the document, it can be registered in the dictionary information as a corresponding translated word, so that the accuracy of the dictionary information is improved.

【００６２】上述の辞書作成装置においては、見出し語
情報は対応する見出し語の訳語を含み、辞書情報補正部
が、辞書情報中の１つ以上の見出し語において文字列の
表記が類似する場合には、当該１つ以上の見出し語につ
いて同一となる訳語を付与するための訳語情報統一部を
含むように構成される。In the above-described dictionary creation device, the headword information includes the translation of the corresponding headword, and the dictionary information correction unit determines whether the character string is similar in one or more headwords in the dictionary information. Is configured to include a translated word information unifying unit for assigning the same translated word to the one or more headwords.

【００６３】この辞書作成装置によれば、辞書情報補正
部の訳語情報統一部により、辞書情報中の１つ以上の見
出し語においてその文字列の表記が相互に類似する場合
には、当該１つ以上の見出し語について同一訳語が付与
される。According to this dictionary creation apparatus, if the notation of the character strings in one or more headwords in the dictionary information is similar to each other by the translated word information unifying unit of the dictionary information correcting unit, The same translation is given to the above headword.

【００６４】したがって、同一事物を表す見出し語であ
りながら、大小文字の違い、空白やハイフン記号の有無
などの本質的でない差異のために別の見出し語として辞
書情報に登録されてしまい、辞書情報の容量が必要以上
に大きくなることが抑制される。Therefore, even though the headword is a headword representing the same thing, it is registered in the dictionary information as another headword due to a difference of a small letter, a space, and the presence or absence of a hyphen symbol. Is suppressed from becoming larger than necessary.

【００６５】上述の辞書作成装置においては、辞書情報
作成部により作成された辞書情報の内容を、入力部を介
して外部から入力される情報に従って所望されるように
修正するための情報修正部がさらに備えられて構成され
る。In the dictionary creating apparatus described above, the information modifying unit for modifying the contents of the dictionary information created by the dictionary information creating unit as desired in accordance with information input from the outside via the input unit is provided. Further provided and configured.

【００６６】この辞書作成装置によれば、作成された辞
書情報は、情報修正部により外部から、たとえばユーザ
から与えられる情報に従いユーザが所望するように変更
できて、容易にユーザカスタマイズされた辞書情報、ま
たは精度の高い辞書情報を得ることができる。According to the dictionary creating apparatus, the created dictionary information can be changed as desired by the user according to the information given from the outside by the information correcting unit, and the dictionary information customized by the user can be easily obtained. Or highly accurate dictionary information.

【００６７】上述の辞書作成装置においては、情報修正
部が、調整処理部による調整処理の内容を入力部を介し
た外部からの指示に従い指定するための外部指定部を備
えて構成される。In the above-described dictionary creation device, the information correction unit includes an external designation unit for designating the contents of the adjustment processing by the adjustment processing unit in accordance with an external instruction via the input unit.

【００６８】この辞書作成装置によれば、情報修正部の
外部指定部により、調整処理部による調整処理の内容を
入力部を介した外部からの指示に従い指定する。したが
って、ユーザは入力部を介して外部から指示を与えるだ
けで外部指定部により調整処理部の調整処理内容を所望
するように可変調整できて辞書情報をよりユーザカスタ
マイズできる。According to this dictionary creation device, the contents of the adjustment processing by the adjustment processing unit are specified by the external specification unit of the information correction unit in accordance with an external instruction via the input unit. Therefore, the user can variably adjust the adjustment processing content of the adjustment processing unit as desired by the external designation unit only by giving an instruction from the outside via the input unit, and can further customize the dictionary information.

【００６９】上述の辞書作成装置においては、外部指定
部により、調整処理部が有する前述した１つ以上の処理
部の適用の有無が、入力部を介した外部からの指示に従
い可変指定されるように構成される。In the above-described dictionary creating apparatus, the external designation unit variably designates whether or not one or more of the above-mentioned processing units of the adjustment processing unit is applied according to an external instruction via the input unit. It is composed of

【００７０】この辞書作成装置によれば、外部指定部に
より、調整処理部が有する前述した１つ以上の処理部の
適用の有無が、入力部を介した外部の指示に従い可変指
定されるから、ユーザは適用を所望する処理部を入力部
を介して外部指定して、所望される処理部を用いて作成
された辞書情報を得ることができる。According to this dictionary creation device, the external designation section variably designates whether or not to apply one or more of the above-described processing sections of the adjustment processing section in accordance with an external instruction through the input section. The user can externally designate a processing unit desired to be applied via the input unit, and obtain dictionary information created using the desired processing unit.

【００７１】上述の辞書作成装置においては、情報修正
部が、調整処理部の１つ以上の処理部それぞれについ
て、当該処理部が適用された辞書情報中の見出し語を最
終的に辞書情報に登録するか否かを入力部を介して決定
するための登録決定部をさらに備えて構成される。In the above-described dictionary creating apparatus, the information correction unit finally registers, in each of one or more processing units of the adjustment processing unit, a headword in the dictionary information to which the processing unit is applied, in the dictionary information. It further includes a registration determining unit for determining whether or not to perform through the input unit.

【００７２】この辞書作成装置によれば、登録決定部に
より調整処理部の１つ以上の処理部それぞれについて、
当該処理部が適用された辞書情報中の見出し語を最終的
に辞書情報に登録するか否かが入力部を介して決定され
る。したがって、ユーザは入力部を操作して、登録決定
部により各処理部が適用されて調整された見出し語につ
いて辞書情報への登録を維持しておくか、または登録を
削除するか任意に選択できて、よりユーザカスタマイズ
された辞書情報を得ることができる。According to this dictionary creation device, the registration deciding unit makes each of at least one processing unit of the adjustment processing unit a
It is determined via the input unit whether or not the headword in the dictionary information to which the processing unit is applied is finally registered in the dictionary information. Therefore, the user can operate the input unit and arbitrarily select whether to maintain the registration in the dictionary information or to delete the registration of the headword adjusted by applying each processing unit by the registration determining unit. Thus, dictionary information customized by the user can be obtained.

【００７３】上述の辞書作成装置においては、外部指定
部による指定時には、補助データ作成部による作成デー
タが提示されるよう構成される。The dictionary creating apparatus described above is configured so that the data created by the auxiliary data creating unit is presented at the time of designation by the external designation unit.

【００７４】この辞書作成装置によれば、外部指定部に
よる指定時には、補助データ作成部により作成されたデ
ータが提示される。したがって、ユーザは提示される作
成データを参考にして、入力部を介して調整処理部の調
整内容に関するより的確な指示を与えて、ユーザカスタ
マイズされた辞書情報を得ることができる。According to this dictionary creation device, at the time of designation by the external designation unit, the data created by the auxiliary data creation unit is presented. Therefore, the user can refer to the presented creation data, give a more accurate instruction regarding the adjustment content of the adjustment processing unit via the input unit, and obtain user-customized dictionary information.

【００７５】この発明のさらなる他の局面に係る翻訳装
置は、第１種類の文字で示される言語から第２種類の文
字で示される言語に翻訳処理するものあって、既存の辞
書と、複数種類の文字が混在する文書から抽出された第
１種類の文字からなる文字列であり既存辞書には未登録
の１つ以上の見出し語と、見出し語のそれぞれについて
該見出し語に関する情報を見出し語情報として有する辞
書情報とが格納された辞書格納部を有して、翻訳処理の
ために辞書格納部中の情報を参照するように構成され
る。A translation device according to still another aspect of the present invention performs a translation process from a language represented by a first type of character to a language represented by a second type of character. Is a character string composed of characters of the first type extracted from a document in which characters are mixed, and one or more headwords not registered in the existing dictionary, and information on each headword for each headword is provided as headword information. It has a dictionary storage unit in which the dictionary information stored as is stored, and is configured to refer to the information in the dictionary storage unit for translation processing.

【００７６】この翻訳装置によれば、第２種類言語の文
章中であっても通常は第１種類言語で記述されるような
用語で既存の辞書には未登録の見出し語（新語、専門用
語）をまとめた辞書情報を参照して、第１種類の文字で
示される第１言語の文書を第２種類の文字で示される第
２言語の文書に翻訳する際に、当該辞書情報を参照する
ので精度の高い翻訳処理が可能となる。According to this translator, even in a sentence of the second type language, a term which is usually described in the first type language and which is not registered in the existing dictionary (new words, technical terms) ) Is referred to when the document of the first language indicated by the first type of character is translated into the document of the second language indicated by the second type of character with reference to the dictionary information summarizing the above. Therefore, highly accurate translation processing can be performed.

【００７７】上述の翻訳装置では、辞書情報は、前述し
た辞書作成装置で作成されるように構成される。In the above-described translation apparatus, the dictionary information is configured to be created by the above-described dictionary creation apparatus.

【００７８】この発明のさらなる他の局面に係る辞書作
成方法は、第１種類の文字で示される第１言語から第２
種類の文字で示される第２言語に翻訳するために用いら
れて翻訳のために用いられて既存の辞書情報には未登録
の見出し語が登録された辞書情報を作成する方法であっ
て、複数種類の文字が混在する文書を含む情報を入力す
るための入力ステップと、入力ステップから入力された
文書から第１種類の文字からなる１つ以上の文字列を抽
出して、抽出文字列が登録された抽出情報を出力する文
字列抽出ステップと、文字列抽出ステップから出力され
た抽出情報の抽出文字列を見出し語とし、該見出し語の
それぞれに対応して該見出し語に関する情報を見出し語
情報として付与して辞書情報を作成する辞書情報作成ス
テップとを備えて構成される。According to still another aspect of the present invention, there is provided a dictionary creating method for converting a first language represented by a first type of character into a second language.
A method for creating dictionary information in which a headword that is used for translation and used for translation and that is not registered in the existing dictionary information is registered using a plurality of types of characters. An input step for inputting information including a document containing mixed types of characters, and extracting one or more character strings of the first type of characters from the document input from the input step, and registering the extracted character strings A character string extracting step of outputting the extracted information, and an extracted character string of the extracted information output from the character string extracting step as a headword, and information relating to the headword corresponding to each of the headwords. And a dictionary information creating step of creating dictionary information by giving

【００７９】この辞書作成方法によれば、複数種類の文
字が混在する文書から第１種類の文字から成る文字列で
あり、既存の辞書には未登録の文字列を抽出して見出し
語として登録された辞書情報が作成されて、第１種類の
文字で示される第１言語の文書を第２種類の文字で示さ
れる第２言語の文書に翻訳するために参照される辞書情
報が作成される。According to this dictionary creation method, a character string composed of a first type of character is extracted from a document in which a plurality of types of characters are mixed, and a character string that is not registered in an existing dictionary is extracted and registered as a headword. The generated dictionary information is created, and the dictionary information referred to for translating the first language document represented by the first type of character into the second language document represented by the second type of character is generated. .

【００８０】したがって、第２種類言語の文章中であっ
てもあえて第１種類言語で記述されるような用語（新
語、専門用語）であり、既存の辞書には未登録の見出し
語をまとめた辞書情報を作成して、第１種類の文字で示
される第１言語の文書を第２種類の文字で示される第２
言語の文書に翻訳する際に、当該辞書情報を参照するよ
うにすればより精度の高い翻訳処理が可能となる。Therefore, even in the text of the second type language, it is a term (new word, technical term) that is daringly described in the first type language, and an unregistered headword is collected in the existing dictionary. By creating dictionary information, a document in a first language represented by a first type of character is converted to a second document represented by a second type of character.
By referring to the dictionary information when translating into a document in a language, translation processing with higher accuracy can be performed.

【００８１】上述の辞書作成方法においては、辞書情報
作成ステップにより作成された辞書情報の内容を、入力
ステップを介して外部から入力される情報に従って所望
されるように修正するための情報修正ステップをさらに
備えて構成される。In the above-described dictionary creation method, an information correction step for correcting the contents of the dictionary information created in the dictionary information creation step as desired in accordance with information input from the outside via the input step is provided. In addition, it is configured.

【００８２】このような辞書作成方法によれば、作成さ
れた辞書情報は、情報修正ステップにより外部から、た
とえばユーザから与えられる情報に従いユーザが所望す
るように変更できて、容易にユーザカスタマイズされて
精度の高い辞書情報を得ることができる。この辞書情報
を参照することでより精度の高い翻訳ができる。According to such a dictionary creation method, the created dictionary information can be changed as desired by the user according to information given from the outside, for example, by the information correction step, and can be easily customized by the user. Highly accurate dictionary information can be obtained. By referring to the dictionary information, more accurate translation can be performed.

【００８３】この発明のさらなる他の局面に係る記録媒
体は、第１言語から第２言語に翻訳するために用いられ
る辞書情報を作成する辞書作成方法をコンピュータに実
行させるための辞書作成プログラムを記録したコンピュ
ータで読取可能な記録媒体であり、以下の特徴を有す
る。A recording medium according to still another aspect of the present invention stores a dictionary creating program for causing a computer to execute a dictionary creating method for creating dictionary information used for translating from a first language to a second language. This is a computer-readable recording medium having the following characteristics.

【００８４】つまり、辞書作成方法は、第１種類の文字
で示される第１言語から第２種類の文字で示される第２
言語に翻訳するために用いられて翻訳のために用いられ
る既存の辞書は未登録の見出し語が登録された辞書情報
を作成する辞書作成方法であって、複数種類の文字が混
在する文書を含む情報を入力するための入力ステップ
と、入力ステップから入力された文書から第１種類の文
字からなる１つ以上の文字列を抽出して、抽出文字列が
登録された抽出情報を出力する文字列抽出ステップと、
文字列抽出ステップから出力された抽出情報の抽出文字
列を見出し語とし、該見出し語のそれぞれに対応して該
見出し語に関する情報を見出し語情報として付与して辞
書情報を作成する辞書情報作成ステップとを備える。In other words, the dictionary creation method is based on the first language represented by the first type of character and the second language represented by the second type of character.
An existing dictionary used for translating to a language and used for translation is a dictionary creation method for creating dictionary information in which unregistered headwords are registered, and includes a document in which a plurality of types of characters are mixed. An input step for inputting information, and a character string for extracting one or more character strings of a first type of character from a document input from the input step, and outputting extracted information in which the extracted character strings are registered An extraction step;
A dictionary information creating step of creating dictionary information by using the extracted character string of the extracted information output from the character string extracting step as a headword and adding information about the headword as headword information corresponding to each of the headwords; And

【００８５】この記録媒体に記録されたプログラムによ
り実行される辞書作成方法によれば、複数種類の文字が
混在する文書から第１種類の文字から成る文字列を抽出
して見出し語として登録された辞書情報が作成されて、
第１種類の文字で示される第１言語の文書を第２種類の
文字で示される第２言語の文書に翻訳するために参照さ
れる辞書情報が作成される。According to the dictionary creation method executed by the program recorded on the recording medium, a character string composed of the first type of character is extracted from a document in which a plurality of types of characters are mixed and registered as a headword. Dictionary information is created,
Dictionary information referred to for translating a first language document represented by a first type of character into a second language document represented by a second type of character is created.

【００８６】したがって、第２種類言語の文章中であっ
ても通常は第１種類言語で記述されるような用語（新
語、専門用語）であり、既存の辞書には未登録の見出し
語をまとめた辞書情報を作成できて、第１種類の文字で
示される第１言語の文書を第２種類の文字で示される第
２言語の文書に翻訳する際に、当該辞書情報を参照する
ようにすればより精度の高い翻訳処理が可能となる。Therefore, even in the text of the second type language, it is a term (new word, technical term) usually described in the first type language, and the unregistered entry words are collected in the existing dictionary. The dictionary information can be created so as to refer to the dictionary information when translating a first language document represented by a first type of character into a second language document represented by a second type of character. If this is the case, more accurate translation processing can be performed.

【００８７】上述のプログラムで実行される辞書作成方
法は、さらに、辞書情報作成ステップにより作成された
辞書情報の内容を、入力ステップを介して外部から入力
される情報に従って所望されるように修正するための情
報修正ステップを備えて構成される。The dictionary creation method executed by the above-mentioned program further modifies the contents of the dictionary information created in the dictionary information creation step as desired according to information input from the outside through the input step. And an information correction step for the information processing.

【００８８】この辞書作成方法によれば、作成された辞
書情報は、情報修正ステップにより外部から、たとえば
ユーザから与えられる情報に従いユーザが所望するよう
に変更できて、容易にユーザカスタマイズされて精度の
高い辞書情報を得ることができる。According to this dictionary creation method, the created dictionary information can be changed as desired by the user in accordance with information given from the outside, for example, by the information correction step, and can be easily customized by the user to achieve high accuracy. High dictionary information can be obtained.

【００８９】[0089]

【発明の実施の形態】以下、この発明の実施の形態につ
いて説明する。Embodiments of the present invention will be described below.

【００９０】図１は、この発明の実施の形態による辞書
作成装置の機能構成を示す図である。図２は、この発明
の実施の形態による辞書作成装置のハードウェア構成を
示す図である。図３は、図１の各機能を用いた辞書作成
のための概略処理手順を示すフローチャートである。FIG. 1 is a diagram showing a functional configuration of a dictionary creation device according to an embodiment of the present invention. FIG. 2 is a diagram showing a hardware configuration of the dictionary creation device according to the embodiment of the present invention. FIG. 3 is a flowchart showing a schematic processing procedure for creating a dictionary using each function of FIG.

【００９１】図２を参照して、辞書作成装置５００は、
入力装置１００、出力装置２００、記憶部３００、該装
置自体を集中的に制御および管理するためのＣＰＵ（中
央処理装置の略）４００、および外部の記憶媒体３３と
の間でプログラムを含む各種データの転送を図るための
媒体アクセス装置３４を含む。Referring to FIG. 2, dictionary creating apparatus 500 includes:
Various data including programs between the input device 100, the output device 200, the storage unit 300, a central processing unit (CPU) 400 for centrally controlling and managing the device itself, and an external storage medium 33. And a medium access device 34 for transferring the data.

【００９２】入力装置１００は、キーボード、マウス、
ペン・タブレットなどの外部操作可能なデータ入力のた
めのデバイスならびに外部の通信ネットワークを介して
データを入力するための通信デバイスなどを含んで構成
される。The input device 100 includes a keyboard, a mouse,
It includes a device for externally operable data input such as a pen / tablet and a communication device for inputting data via an external communication network.

【００９３】出力装置２００は、液晶ディスプレイ、プ
ラズマディスプレイからなる表示装置やレーザプリン
タ、サーマルプリンタからなるプリンタを含んで構成さ
れる。The output device 200 includes a display device including a liquid crystal display and a plasma display, and a printer including a laser printer and a thermal printer.

【００９４】ＲＯＭ（リードオンリメモリ）およびＲＡ
Ｍ（ランダムアクセスメモリ）を含んで構成される記憶
部３００は、データ記憶部３とプログラム記憶部３０１
とを含む。プログラム記憶部３０１には、後述する各フ
ローチャートにして示される図１の各機能部における処
理プログラムが予め格納されて、ＣＰＵ４００の制御の
下に実行される。ROM (Read Only Memory) and RA
The storage unit 300 including M (random access memory) includes a data storage unit 3 and a program storage unit 301
And In the program storage unit 301, processing programs in each functional unit of FIG. 1 shown in each flowchart described later are stored in advance and executed under the control of the CPU 400.

【００９５】図４は、図２のデータ記憶部３の記憶内容
の一例を示す図である。データ記憶部３には図４に示さ
れるように辞書作成装置５００において作成される辞書
データ３０、辞書データ３０を作成するために入力装置
１００を介して入力された文書データ４０、辞書作成に
おける各種規則が登録される規則データ５０、一般的に
基本とされる辞書のデータである基本辞書データ６０、
専門分野に適用される辞書のデータである専門辞書デー
タ７０および装置５００における各種処理において用い
られるデータである各種のデータ８０が登録される。FIG. 4 is a diagram showing an example of the contents stored in the data storage section 3 of FIG. As shown in FIG. 4, the data storage unit 3 stores the dictionary data 30 created by the dictionary creation device 500, the document data 40 input via the input device 100 to create the dictionary data 30, and various types of data in the dictionary creation. Rule data 50 in which rules are registered, basic dictionary data 60 which is data of a dictionary which is generally used as a basis,
The specialized dictionary data 70, which is dictionary data applied to a specialized field, and various data 80, which is data used in various processes in the device 500, are registered.

【００９６】辞書データ３０は複数の異なる見出し語３
１のそれぞれに対応して辞書情報３２を含む。辞書情報
３２は対応する見出し語３１を翻訳して得られた訳語３
２Ａおよび対応する見出し語の品詞３２Ｂを含む。The dictionary data 30 includes a plurality of different headwords 3
1 includes the dictionary information 32. The dictionary information 32 is a translation 3 obtained by translating the corresponding headword 31.
2A and the corresponding part of speech 32B.

【００９７】規則データ５０は辞書データ３０の作成に
ついて適用することのできる１つ以上の規則の種類５１
および規則の種類５１のそれぞれについて該規則を辞書
データ３０の作成に際して適用するか否かを指定する適
用の有／無情報５２を含む。The rule data 50 includes one or more rule types 51 applicable to the creation of the dictionary data 30.
For each of the rule types 51, there is included application presence / absence information 52 for designating whether or not the rule is applied when the dictionary data 30 is created.

【００９８】次に、図１を参照して辞書作成装置５００
の機能構成について説明する。図１において辞書作成装
置５００は文書データ４０を入力するとともに利用者の
各種指定情報も入力するための入力部１、作成された辞
書データ３０を含む各種のデータの表示を含む出力また
は外部記憶媒体３３にデータを書込んで格納するための
出力部２、データ記憶部３、文字列抽出部４、見出し語
認識部５、辞書情報作成部１７、辞書情報補正部１８、
利用者支援部２２および補助データ作成部２９を含む。Next, with reference to FIG.
Will be described. In FIG. 1, a dictionary creation device 500 has an input unit 1 for inputting document data 40 and also various designation information of a user, an output including display of various data including the created dictionary data 30, or an external storage medium. An output unit 2, a data storage unit 3, a character string extraction unit 4, a headword recognition unit 5, a dictionary information creation unit 17, a dictionary information correction unit 18,
It includes a user support unit 22 and an auxiliary data creation unit 29.

【００９９】入力部１、見出し語認識部５、辞書情報補
正部１８および利用者支援部２２のそれぞれは本発明の
入力部、調整処理部、辞書情報補正部および情報修正部
のそれぞれに含まれる。Each of the input unit 1, headword recognition unit 5, dictionary information correction unit 18 and user support unit 22 is included in the input unit, adjustment processing unit, dictionary information correction unit and information correction unit of the present invention. .

【０１００】文字列抽出部４は入力部１から入力された
文書データ４０中から特定種類の文字列を抽出する。見
出し語認識部５は、文字列抽出部４により不適切な文字
列が抽出されないよう調整する。そのために見出し語認
識部５は括弧記号の前後の文字列を処理する括弧記号除
去分離部６、特定の記号の前後で文字列を分割する特定
記号分割部７、文字列中の数字列を除去する数字列除去
部８、レイアウト情報を利用して文字列の分割または抽
出を行なうレイアウト情報チェック部９、文字列中の単
語を数えて規定以下の単語数の文字列は抽出されないよ
うにするための単語数チェック部１０、文字列の長さを
検出して規定以下の長さの文字列は抽出されないように
するための文字列長チェック部１１、英数字の大文字・
小文字を判別する大小文字チェック部１２、抽出された
文字列の後方を見て抽出された文字列の品詞を判別する
品詞チェック部１３、後述する頻度情報作成部２９０が
作成する頻度情報をもとに、文書４０中で一定回数以上
出現した文字列のみが抽出されるようにする頻度情報チ
ェック部１４、後述する辞書引き結果作成部２９１が作
成する辞書引き結果をもとに、登録済の単語が辞書に追
加登録されないようチェックする辞書引きチェック部１
５、後述する翻訳結果作成部２９２が作成する翻訳結果
をもとに、翻訳結果が同等である単語は辞書に重複して
登録されないようチェックする翻訳チェック部１６を含
む。The character string extracting unit 4 extracts a specific type of character string from the document data 40 input from the input unit 1. The headword recognizing unit 5 adjusts the character string extracting unit 4 so that an inappropriate character string is not extracted. For this purpose, the headword recognition unit 5 processes a character string before and after a parenthesis symbol, a parenthesis symbol removal / separation unit 6, a specific symbol division unit 7 that divides a character string before and after a specific symbol, and removes a numeric string in the character string. A string information removing unit 8 for dividing or extracting a character string using the layout information, and a character string having a number of words less than or equal to a specified number by counting the words in the character string. , A character string length check unit 11 for detecting the length of a character string and preventing a character string having a length less than a prescribed length from being extracted,
A lower case checker 12 for determining lower case, a part of speech checker 13 for determining the part of speech of the extracted character string by looking behind the extracted character string, and a frequency information generator 290 described later based on frequency information created by a frequency information generator 290. In addition, based on a dictionary lookup result created by a dictionary lookup result creation section 291 to be described later, a frequency information check section 14 for extracting only a character string that appears a certain number of times or more in the document 40, and a registered word Dictionary lookup check unit 1 to check that data is not additionally registered in the dictionary
5. The translation checking unit 16 includes a translation checking unit 16 that checks, based on a translation result created by a translation result creation unit 292 described later, that words having the same translation result are not registered in the dictionary redundantly.

【０１０１】辞書情報作成部１７は文字列抽出部４で抽
出された文字列である見出し語３１に対し訳語３２Ａや
品詞３２Ｂを含む辞書情報３２を付与する。The dictionary information creating unit 17 assigns the dictionary information 32 including the translated word 32A and the part of speech 32B to the headword 31 which is the character string extracted by the character string extracting unit 4.

【０１０２】辞書情報補正部１８は、辞書情報作成部１
７により作成された辞書データ３０の内容を補正するた
めに、訳語全角化部１９、訳語情報抽出部２０および訳
語情報統一部２１を含む。訳語全角化部１９は抽出され
た文字列を全角文字列に変換して辞書データ３０に見出
し語３１として登録する。訳語情報抽出部２０は、見出
し語３１に、入力文書４０中から訳語３２Ａを推定して
付与する。訳語情報統一部２１は、抽出された文字列の
類似表記が１つ以上存在する場合に類似表記のそれぞれ
に対して統一の訳語３２Ｂを付与する。The dictionary information correction unit 18 is provided with the dictionary information creation unit 1
In order to correct the contents of the dictionary data 30 created by the step 7, the translation word full-width unit 19, the translation word information extraction unit 20, and the translation word information unification unit 21 are included. The translated word double-byte unit 19 converts the extracted character string into a double-byte character string and registers it in the dictionary data 30 as a headword 31. The translated word information extraction unit 20 estimates and assigns the translated word 32A from the input document 40 to the headword 31. When there is at least one similar notation of the extracted character string, the translated word information unifying unit 21 assigns a unified translated word 32B to each of the similar notations.

【０１０３】利用者支援部２２は、辞書データ３０の内
容を利用者が所望するように編集するよう支援処理す
る。そのために、利用者の所望に応じて辞書データ３０
の内容を修正するための辞書情報修正部２３、利用者が
指定する規則を選択したり、辞書作成の処理に関する各
種の定数を設定できる適用規則指示部２４、利用者が辞
書データ３０への登録の有無を検討しやすくなるための
登録保留マークを付与する規則を選択できる保留規則指
示部２５、利用者指示の判断の参考になるように後述す
る頻度情報作成部２９０から頻度情報を取得する頻度情
報作成指示部２６、および同様に後述する辞書引き結果
作成部２９１から辞書引き結果を取得する辞書引き指示
部２７、ならびに同様に後述する翻訳結果作成部２９２
から翻訳結果を取得する翻訳指示部２８を含む。The user support unit 22 performs a support process to edit the contents of the dictionary data 30 as desired by the user. For this purpose, the dictionary data 30
Dictionary information correcting unit 23 for correcting the contents of the dictionary, an application rule indicating unit 24 for selecting a rule specified by the user, and setting various constants relating to the process of creating a dictionary, and registering the user in the dictionary data 30. Rule instructing unit 25 that can select a rule for assigning a registration pending mark to make it easier to determine the presence or absence of a frequency of acquiring frequency information from a frequency information creating unit 290, which will be described later, as a reference for determining user instructions. An information creation instruction unit 26, a dictionary lookup instruction unit 27 that acquires a dictionary lookup result from a dictionary lookup result creation unit 291 also described later, and a translation result creation unit 292 also described later.
And a translation instructing unit 28 for acquiring a translation result from the.

【０１０４】辞書情報修正部２３および適用規則指示部
２４は本発明の辞書情報所望修正部および外部指定部の
それぞれに含まれる。The dictionary information correction unit 23 and the application rule designating unit 24 are included in the dictionary information desired correction unit and the external designation unit of the present invention.

【０１０５】補助データ作成部２９は、見出し語認識部
５および利用者支援部２２での処理を補助するためのデ
ータを作成する。補助データ作成部２９は文書データ４
０から同じ文字列が文字列抽出部４により何回抽出され
たかをカウントする頻度情報作成部２９０、文書データ
４０から抽出された文字列を既存の辞書データ３０の見
出し語３１から検索する辞書引き結果作成部２９１およ
び抽出された文字列をデータ記憶部３中の各種辞書を用
いて翻訳する翻訳結果作成部２９２を含む。The auxiliary data creation unit 29 creates data for assisting the processing in the headword recognition unit 5 and the user support unit 22. The auxiliary data creation unit 29 stores the document data 4
A frequency information creation unit 290 that counts how many times the same character string is extracted from 0 by the character string extraction unit 4, a dictionary lookup that searches the character string extracted from the document data 40 from the headword 31 of the existing dictionary data 30 It includes a result creation unit 291 and a translation result creation unit 292 that translates the extracted character string using various dictionaries in the data storage unit 3.

【０１０６】次に、図３の処理フローチャートを参照し
て、辞書作成装置５００における辞書作成の処理動作を
図１の構成も参照しながら説明する。ここでは、入力さ
れた日本語文書である文書データ４０から英語文字列を
抽出して、これを見出し語３１として辞書データ３０に
登録することを想定する。Next, with reference to the processing flowchart of FIG. 3, the dictionary creation processing operation of the dictionary creation device 500 will be described with reference to the configuration of FIG. Here, it is assumed that an English character string is extracted from the input document data 40, which is a Japanese document, and registered in the dictionary data 30 as a headword 31.

【０１０７】図５は、図４の入力文書データ４０の一例
を示す図である。まず、入力部１を介してユーザが文書
データ４０を入力すると（図３のＳ１で文書入力）、Ｓ
２以降の処理が開始される。FIG. 5 is a diagram showing an example of the input document data 40 of FIG. First, when the user inputs the document data 40 via the input unit 1 (document input in S1 of FIG. 3), S
The second and subsequent processes are started.

【０１０８】一方、作成された辞書データ３０の内容を
ユーザの所望に応じて修正するための辞書データ修正の
指示入力があった場合は、利用者支援部２２により辞書
データ３０の内容がユーザの所望するように修正される
（Ｓ１０）。この詳細は後述する。On the other hand, when an instruction to modify the dictionary data 30 to modify the contents of the created dictionary data 30 as desired by the user is input, the contents of the dictionary data 30 It is modified as desired (S10). The details will be described later.

【０１０９】今、入力部１を介してユーザが図５に示す
ような日本語の文書データ４０を入力するとデータ記憶
部３にストアされて、文字列抽出部４に与えられて処理
される（Ｓ２）。Now, when the user inputs the Japanese document data 40 as shown in FIG. 5 through the input unit 1, it is stored in the data storage unit 3 and given to the character string extraction unit 4 for processing ( S2).

【０１１０】たとえば、図５の入力文書データ４０は人
間が作成した文書でもよいし英日機械翻訳装置の翻訳結
果でもいい。ただし、ＨＴＭＬ（Hyper Text Mark-up L
anguageの略）やＲＴＦ（Rich Text Formatの略）など
のようにフォーマット化された文章の場合はタグや装飾
文字情報を取除いた文書とする（なお、タグや装飾文字
情報などの不要な情報（言語外情報）を既存技術を使う
ことで除去することは容易であるから、説明は省略され
る。）。For example, the input document data 40 in FIG. 5 may be a document created by a human or a translation result of an English-Japanese machine translation device. However, HTML (Hyper Text Mark-up L
In the case of a text formatted like anguage or RTF (Rich Text Format), it is assumed that the document has tags and decorative character information removed (unnecessary information such as tags and decorative character information). (It is easy to remove (out-of-language information) by using existing technology, so the description is omitted.)

【０１１１】図３を参照して、文字列抽出部４は入力文
書データ４０中から特定種類の文字列を抽出する。この
とき、見出し語認識部５は不適切な文字列抽出がされな
いよう調整処理する（Ｓ３）。また、見出し語認識部５
による調整処理を補助するために参照されるデータが補
助データ作成部２９により作成される（Ｓ４）。Referring to FIG. 3, character string extracting section 4 extracts a specific type of character string from input document data 40. At this time, the headword recognizing unit 5 performs an adjustment process so that an inappropriate character string is not extracted (S3). Also, a headword recognition unit 5
The data referred to to assist the adjustment process by the auxiliary data generator 29 is generated by the auxiliary data generator 29 (S4).

【０１１２】文字列抽出部４にて抽出された文字列は辞
書情報作成部１７により辞書データ３０に見出し語３１
として登録されるとともに、対応する辞書情報３２が作
成されて辞書データ３０に登録される（Ｓ５）。このと
き、作成された辞書情報３２が辞書情報補正部１８によ
り補正されて登録される（Ｓ６）。The character string extracted by the character string extracting unit 4 is converted into the dictionary data 30 by the dictionary information creating unit 17 into the dictionary data 30.
And the corresponding dictionary information 32 is created and registered in the dictionary data 30 (S5). At this time, the created dictionary information 32 is corrected and registered by the dictionary information correction unit 18 (S6).

【０１１３】その後、出力部２によりデータ記憶部３か
ら辞書データ３０が読出されて出力装置２００に出力さ
れたり、媒体アクセス装置３４を介して外部記憶媒体３
３に書込んで登録される。Thereafter, the dictionary data 30 is read from the data storage unit 3 by the output unit 2 and output to the output device 200 or the external storage medium 3 via the medium access device 34.
3 and registered.

【０１１４】図６は、図１の文字列抽出部４の処理フロ
ーチャートである。図７は、図６の処理により得られる
抽出情報４４を示す図である。抽出情報４４は各種のデ
ータ８０としてデータ記憶部３に記憶される。FIG. 6 is a processing flowchart of the character string extracting unit 4 of FIG. FIG. 7 is a diagram showing the extracted information 44 obtained by the processing of FIG. The extraction information 44 is stored in the data storage unit 3 as various data 80.

【０１１５】前述したように、入力部１で読込まれた日
本語の文書データ４０は、まず文字列抽出部４に渡され
る。文字列抽出部４では辞書データ３０に見出し語３１
として登録すべき文字列を抽出する。文書データ４０は
日本語文書であるため大部分は日本語文字列からなる
が、中には、４行目の「Copernicus」、６行目の「Zaur
us」、および７行目の「ReadMe Firs」にあるように英
語の文字列が出現する。これら英語文字列は製品名や人
物名などの固有名詞または技術用語などを表現し日本語
に対応する訳語がなかったり周知の名称や用語でなかっ
たりするために該日本語の文書データ４０の著者により
あえて原語（英語）による文字列そのままが用いられて
いる。As described above, the Japanese document data 40 read by the input unit 1 is first passed to the character string extraction unit 4. The character string extraction unit 4 stores the headword 31 in the dictionary data 30.
A character string to be registered as is extracted. Since the document data 40 is a Japanese document, the document data 40 is mostly composed of Japanese character strings, but includes “Copernicus” on the fourth line and “Zaur
As shown in “us” and “ReadMe Firs” in the seventh line, an English character string appears. Since these English character strings express proper nouns such as product names and person names or technical terms, and have no translated word corresponding to Japanese or a well-known name or term, the author of the Japanese document data 40 The character string in the original language (English) is used as it is.

【０１１６】本実施の形態による辞書作成装置５００
は、このような別の言語／文字で書かれた文字列（単語
列）を文書データ４０中から抽出し、辞書データ３０の
見出し語３１として登録するよう処理する。The dictionary creating apparatus 500 according to the present embodiment
Extracts a character string (word string) written in such another language / character from the document data 40 and performs processing to register it as a headword 31 of the dictionary data 30.

【０１１７】ここで文字列抽出部４による文字列抽出の
方法を図６の処理フローに従い説明する。Here, a method of extracting a character string by the character string extracting unit 4 will be described with reference to the processing flow of FIG.

【０１１８】まず、入力された文書データ４０の文字の
位置を示す変数ｐをｐ＝０として、文書データ４０の先
頭（ｐ＝０）から読込んでいき（Ｓ２０１〜Ｓ２０
２）、言語の種類が切換わる箇所を検出する（Ｓ２０４
〜Ｓ２０６）。文書データ４０においては日本語文字か
ら英語文字に切換わる箇所は“Copernicus”の部分であ
る。First, the variable p indicating the position of the character of the input document data 40 is set to p = 0 and read from the head (p = 0) of the document data 40 (S201 to S20).
2), detecting the location where the language type is switched (S204)
To S206). In the document data 40, the part where the characters are switched from Japanese characters to English characters is "Copernicus".

【０１１９】ここで、日本語文字から英語文字に切換わ
る場所の検出手順は次のとおりである。英語文字（アル
ファベット、数字、記号）の１文字が１バイト（８ビッ
ト）で表現できる（半角文字）のとは異なり、日本語文
字（漢字、平仮名、片仮名）は非常に多くの文字が存在
するため、１文字２バイト（１６ビット）で表現される
（全角文字）。ただし、英語２文字（２バイト）と日本
語１文字（２バイト）を識別するため、日本語１文字２
バイトの使い方にはいくつかの体系（漢字コード体系）
がある。ＪＩＳでは日本語文字列の前後を特定のコード
で挟み、シフトＪＩＳでは特定のエリアにシフトし、Ｅ
ＵＣ（Extended Unix Codeの略）コードでは最上位ビッ
トを立てることで、英語２文字（２バイト）と識別され
る。Here, the procedure for detecting the place where the Japanese character is switched to the English character is as follows. Unlike English characters (alphabet, numbers, and symbols) that can be expressed in one byte (8 bits) (half-width characters), Japanese characters (kanji, hiragana, katakana) have a very large number of characters Therefore, each character is represented by 2 bytes (16 bits) (full-width character). However, in order to distinguish between 2 English characters (2 bytes) and 1 Japanese character (2 bytes), 2 Japanese 1 characters
Several systems (Kanji code system) for using bytes
There is. In JIS, the front and back of a Japanese character string is sandwiched between specific codes, and in shift JIS, the characters are shifted to a specific area,
With the UC (abbreviation for Extended Unix Code) code, the most significant bit is set to identify it as two English characters (two bytes).

【０１２０】本実施の形態では、この日本語文字の全角
文字表現と英語文字の半角文字表現の違いを利用して、
日本語の文書データ４０中の日本語文字と英語文字の切
換わり箇所が検出される。たとえば文書データ４０では
「…−タ Cop…」を切換わった箇所の半角開始位置Ｓ
として記憶する（Ｓ２０７）。In this embodiment, the difference between the full-width character expression of Japanese characters and the half-width character expression of English characters is used.
A switching point between Japanese characters and English characters in the Japanese document data 40 is detected. For example, in the document data 40, the half-width start position S of the location where "...-
(S207).

【０１２１】同様に、英語文字から日本語文字に戻る箇
所を半角終了位置Ｅとして記憶する（Ｓ２０８）。半角
開始位置Ｓから半角終了位置Ｅまでの文字列が抽出され
て、図７の抽出情報４４に抽出文字列４Ａとして格納さ
れる（Ｓ２０９）。Similarly, the position where the English character returns to the Japanese character is stored as the half-width end position E (S208). A character string from the half-width start position S to the half-width end position E is extracted and stored as the extracted character string 4A in the extraction information 44 of FIG. 7 (S209).

【０１２２】以降、同様に処理し、入力すべき文字がな
くなるまですなわち文書データ４０の最後の文字まで読
込終了したとき（Ｓ２０３）、日本語の文書データ４０
中のすべての英語文字列が抽出文字列４Ａとして抽出さ
れて抽出情報４４に格納される。なお、同一の文書デー
タ４０中に同じ英語文字列が何度も出現することがある
から、同時に出現回数を示す出現頻度４Ｂも抽出文字列
４Ａごとに登録される。また抽出文字列４Ａの文書デー
タ４０における抽出位置が位置ＳおよびＥを用いて登録
される。Thereafter, the same processing is performed, and until reading of the last character of the document data 40 is completed until there are no more characters to be input (S203), the Japanese document data 40 is read.
All of the English character strings therein are extracted as extracted character strings 4A and stored in the extraction information 44. In addition, since the same English character string may appear many times in the same document data 40, an appearance frequency 4B indicating the number of appearances is also registered for each extracted character string 4A. The extraction position of the extracted character string 4A in the document data 40 is registered using the positions S and E.

【０１２３】なお、見出し語認識部５の処理（Ｓ２１
０）は後述する。本実施の形態では、日本語の文書デー
タ４０からの英語の文字列の抽出を例にとって説明した
が、これに限定されない。日本語以外の言語（たとえば
韓国語）で書かれた文書データ４０に現われる英語文字
列を抽出文字列４Ａとすることも可能であるし、逆に、
英語の文書データ４０から日本語の文字列を抽出文字列
４Ａとして抽出することも可能である。また、日本語の
文書データ４０の中の漢字語句や、平仮名、片仮名文字
列を抽出文字列４Ａとして抽出することもできる。The processing of the headword recognition unit 5 (S21)
0) will be described later. In the present embodiment, the extraction of the English character string from the Japanese document data 40 has been described as an example, but the present invention is not limited to this. An English character string appearing in the document data 40 written in a language other than Japanese (for example, Korean) can be used as the extracted character string 4A.
It is also possible to extract a Japanese character string from the English document data 40 as the extracted character string 4A. Further, a kanji phrase, a hiragana character, a katakana character string in the Japanese document data 40 can be extracted as the extraction character string 4A.

【０１２４】また、本実施の形態では、文字の切換位置
情報をもとにした説明であったが、英語とドイツ語のよ
うにほとんど同じ字種で構成される言語間に対してはた
とえばドイツ語のウムラウト記号のように一方の言語固
有の文字を含む単語を抽出文字列４Ａとして抽出するこ
とでもよい。また、そのような字種に違いのない英国英
語と米国英語の違いも、各言語に対応の基本辞書データ
６０などをチェックすることで可能である。In the present embodiment, the description has been made based on the character switching position information. However, for languages composed of almost the same character type, such as English and German, for example, German A word including a character unique to one language, such as a word umlaut symbol, may be extracted as the extracted character string 4A. Further, such a difference between British English and American English having no difference in character type can be made by checking the basic dictionary data 60 corresponding to each language.

【０１２５】図８は、図１の辞書情報作成部１７の処理
フローチャートである。図９は、図８の処理により得ら
れる辞書データ３０の一例を示す図である。辞書情報作
成部１７では文字列抽出部４にて得られた図７の抽出情
報４４を参照して、まず先頭から順に抽出文字列４Ａが
あるか判定して、なくなれば（Ｓ５００でＮＯ）処理を
終了するが、抽出文字列４Ａがあれば、データ記憶部３
の辞書データ３０に各抽出文字列４Ａを見出し語３１と
して登録するとともに、これに対応の辞書情報３２を登
録して図９の辞書データ３０が得られる（Ｓ５００とＳ
５０１のループ処理）。FIG. 8 is a processing flowchart of the dictionary information creating unit 17 of FIG. FIG. 9 is a diagram showing an example of the dictionary data 30 obtained by the processing of FIG. The dictionary information creation unit 17 refers to the extraction information 44 in FIG. 7 obtained by the character string extraction unit 4 and first determines whether there is an extracted character string 4A in order from the top, and if it is lost (NO in S500), processing Is terminated, but if there is an extracted character string 4A, the data storage unit 3
Each of the extracted character strings 4A is registered as a headword 31 in the dictionary data 30 of FIG. 9 and the corresponding dictionary information 32 is registered to obtain the dictionary data 30 of FIG. 9 (S500 and S500).
501 loop processing).

【０１２６】この場合、ここでは訳語のまだ定まってい
ない英単語（英文字列）をそのまま訳出するため、品詞
３２Ｂに名詞（noun）が設定され、訳語３２Ａには対応
の見出し語３１と同じ文字列が設定される。In this case, a noun (noun) is set in the part of speech 32B in order to translate an English word (English character string) for which the translation is not yet determined, and the translation 32A has the same character as the corresponding headword 31. The column is set.

【０１２７】以降同様にして日本語の文書データ４０中
から抽出されたすべての英語の抽出文字列４Ａが見出し
語３１として登録されるとともに、対応の品詞３２Ｂお
よび訳語３２Ａからなる辞書情報３２が登録されて、辞
書データ３０が形成される。この結果、文書データ４０
から図９に示される辞書データが得られる。In the same manner, all English extracted character strings 4A extracted from the Japanese document data 40 are registered as headwords 31, and dictionary information 32 including corresponding parts of speech 32B and translated words 32A is registered. Thus, dictionary data 30 is formed. As a result, the document data 40
From this, the dictionary data shown in FIG. 9 is obtained.

【０１２８】図１０は、図１の辞書作成装置で作成され
た辞書データ３０を翻訳用に参照される辞書データの一
部として利用する翻訳装置のブロック構成図である。FIG. 10 is a block diagram of a translation apparatus that uses the dictionary data 30 created by the dictionary creation apparatus of FIG. 1 as a part of dictionary data referred for translation.

【０１２９】図１で作成された辞書データ３０は、ＣＰ
Ｕ４００の制御の下にデータ記憶部３から読出されて媒
体アクセス装置３４を介して外部記憶装置３３に記録さ
れる。The dictionary data 30 created in FIG.
The data is read from the data storage unit 3 under the control of U400 and recorded in the external storage device 33 via the medium access device 34.

【０１３０】図１０において翻訳装置５０１は、ＣＰＵ
６００、ＲＯＭおよびＲＡＭからなり翻訳のためのデー
タ格納部７００、出力部８００、入力部９００および外
部記憶媒体３３をアクセスするための媒体アクセス装置
３４１を含む。図１の辞書作成装置５００にて作成され
て外部記憶媒体３３に記録された辞書データ３０は、翻
訳装置５０１の媒体アクセス装置３４１により読出され
て、ＣＰＵ６００を介して翻訳のためのデータ格納部７
００中の翻訳用辞書データ７１０中に登録される。Referring to FIG. 10, translation apparatus 501 has a CPU
A data access unit 341 for accessing the data storage unit 700 for translation, an output unit 800, an input unit 900, and the external storage medium 33. The dictionary data 30 created by the dictionary creation device 500 of FIG. 1 and recorded in the external storage medium 33 is read by the medium access device 341 of the translation device 501, and is translated via the CPU 600 into the data storage unit 7 for translation.
The translation dictionary data 710 is registered in the translation dictionary data 710.

【０１３１】翻訳装置５０１では、以上のようにして作
成された辞書データ３０を、日本語に訳したくない英単
語（無翻訳語）の集まりとみなし英日翻訳のユーザ辞書
として翻訳辞書データ７１０に登録して利用すれば、製
品や人物などの固有名詞や技術用語など新しく出てきた
語であって日本語に対応する訳語がない、定着していな
いなど無理に日本語化して訳されるより、英単語のまま
訳出された方が読みやすい翻訳結果を得ることができ
る。The translation device 501 regards the dictionary data 30 created as described above as a set of English words (non-translated words) not desired to be translated into Japanese and translates them into the translation dictionary data 710 as a user dictionary for English-Japanese translation. If you register and use it, it is a new term such as a proper noun or technical term such as a product or person, and there is no translated word corresponding to Japanese, it is not fixed and it is forcibly translated into Japanese Thus, a translation result that is easier to read when translated as an English word can be obtained.

【０１３２】たとえば、“Take Out”という何らかの製
品の名称を含んだ次の文章“This Take Out is Popula
r.”をそのまま翻訳すると、“これ、外へかかる、好評
である”と破綻した翻訳結果となる。そこで辞書作成装
置５００で作成する辞書データ３０を翻訳用辞書データ
７１０として登録しておけば、“この Take Out は、好
評である”という適切な翻訳結果を得ることができる。For example, the following sentence “This Take Out is Popula” including the name of some product “Take Out”
".r." as it is, the translation result is broken as "this goes out, is popular." Therefore, if the dictionary data 30 created by the dictionary creation device 500 is registered as the translation dictionary data 710, , "Take Out is popular," you get the appropriate translation.

【０１３３】図１１〜図１３（Ａ）と（Ｂ）は、図１の
見出し語認識部５の処理フローチャートである。図１１
には見出し語認識部５の括弧記号除去分離部６、特定記
号分割部７および数字列除去部８の処理が示され、図１
２にはレイアウト情報チェック部９、単語数チェック部
１０、文字列長チェック部１１、大小文字チェック部１
２、品詞チェック部１３および頻度情報チェック部１４
の処理が示され、図１３（Ａ）と（Ｂ）には辞書引きチ
ェック部１５および翻訳チェック部１６の処理が示され
る。FIGS. 11 to 13A and 13B are processing flowcharts of the headword recognizing unit 5 of FIG. FIG.
1 shows the processing of the parenthesis symbol removal / separation unit 6, the specific symbol division unit 7, and the numeric string removal unit 8 of the headword recognition unit 5, and FIG.
Reference numeral 2 denotes a layout information check unit 9, a word count check unit 10, a character string length check unit 11, and a case check unit 1.
2. Part of speech checker 13 and frequency information checker 14
13A and 13B show the processing of the dictionary check unit 15 and the translation check unit 16.

【０１３４】上述した文字列抽出部４では、単純に言語
が切換わった箇所の情報だけの文字列を抽出しこれを見
出し語３１として辞書データ３０に登録されるようにし
ていた。したがって、この登録方法によれば、たとえば
「ここでは、機械翻訳（Machine Translation）につい
て述べる」という文字列があった場合に括弧記号が半角
であるため、抽出文字列４Ａとして“Machine Translat
ion”が抽出されてこれが見出し語３１として登録され
てしまう。The above-described character string extracting section 4 simply extracts a character string of only the information of the location where the language has been switched, and registers the extracted character string in the dictionary data 30 as a headword 31. Therefore, according to this registration method, for example, when there is a character string “Here, machine translation (Machine Translation) is described”, the parentheses are half-width, so “Machine Translat
ion "is extracted and registered as the entry word 31.

【０１３５】このような不適切な見出し語３１の登録を
予め排除するために、本実施の形態では上述した文字列
抽出部４で抽出された文字列４Ａの中から適切な見出し
語３１となるような抽出文字列４Ａだけを認識するため
に文字列４Ａ抽出毎に動作する見出し語認識部５が設け
られている。見出し語認識部５の処理を、図１１〜図１
３（Ａ）と（Ｂ）のフローチャートに従い説明する。In order to exclude such an inappropriate entry 31 in advance, in the present embodiment, an appropriate entry 31 is selected from among the character strings 4A extracted by the above-described character string extraction unit 4. In order to recognize only the extracted character string 4A, a headword recognition unit 5 that operates every time the character string 4A is extracted is provided. The processing of the headword recognition unit 5 is described in FIGS.
3A and 3B will be described.

【０１３６】６：括弧記号除去分離部最初に、「機械翻訳（Machine Translation）の歴史は
古い」→抽出文字列“（Machine Translation）”のよ
うに抽出された英語の文字列４Ａが、括弧記号を含んで
いる場合には括弧記号除去分離部６が適用されて図１１
のＳ３０１〜Ｓ３０７の処理が実行される。6: Bracket symbol removal / separation unit First, the English character string 4A extracted like “Machine Translation has an old history” → extracted character string “(Machine Translation)” Is included, the parenthesized symbol removing / separating unit 6 is applied, and FIG.
S301 to S307 are executed.

【０１３７】なお、括弧記号除去分離部６を適用するた
めに、図７に示されるように文字列抽出部４において予
め括弧記号位置４Ｄ（可能ならば開きと閉じの対で）が
登録される。括弧記号除去分離部６では括弧の出現パタ
ーンにより処理が異なる。具体的には「電子新聞（Elec
tronic Newspaperと…）→“（Electronic Newspape
r）”というように抽出された英語の文字列４Ａの先頭
や末尾が括弧記号である場合（Ｓ３０２）、括弧記号位
置４０に従いそれら括弧記号が削除された上で文字列
“Electronic Newspaper”が抽出文字列４Ａとして抽出
情報４４に登録される。In order to apply the parenthesis symbol removal / separation unit 6, a parenthesis symbol position 4D (a pair of opening and closing if possible) is registered in the character string extraction unit 4 as shown in FIG. . The processing in the parenthesis symbol removal / separation unit 6 depends on the appearance pattern of the parentheses. Specifically, "Electronic Newspaper (Elec
tronic Newspaper…) → “(Electronic Newspape
If the beginning or end of the English character string 4A extracted as “r)” is a parenthesis symbol (S302), the character string “Electronic Newspaper” is extracted after removing the parentheses according to the parenthesis symbol position 40. The character string 4A is registered in the extraction information 44.

【０１３８】また、「…とデジタル電子図書館ＤＬ
（“Digital Library”）は、」→“ＤＬ（Digital Lib
rary）”のように文字列抽出されて、開き括弧記号の前
にさらに英語文字列（ＤＬ）がある場合（Ｓ３０４）、
括弧記号位置４Ｄに従い括弧内の文字列“Digital Libr
ary”を抽出文字列４Ａとして抽出するだけでなく、略
語を表わす先頭の英語文字列“ＤＬ”も別の見出し語と
なるように抽出文字列４Ａとして抽出情報４４に登録す
る（Ｓ３０５）。In addition, "... and digital electronic library DL
(“Digital Library”) → “DL (Digital Lib
rary) ”, and there is an additional English character string (DL) before the open parenthesis symbol (S304).
Character string "Digital Libr" in parentheses according to parenthesis symbol position 4D
In addition to extracting “ary” as the extracted character string 4A, the leading English character string “DL” representing the abbreviation is registered in the extraction information 44 as the extracted character string 4A so as to be another headword (S305).

【０１３９】また「…（Ｃ）Sharp Corporation…」→
文字列“（Ｃ）Sharp Corporation”がそのまま抽出情
報４４に抽出文字列４Ａとして登録される。このよう
に、括弧全体が著作権を表わす記号（Ｃ）商標登録を表
わす記号（Ｒ）だった場合（Ｓ３０６）、この括弧全体
を削除し文字列“Sharp Corporation”だけを抽出文字
列４Ａとして抽出情報４４に登録する。[... (C) Sharp Corporation ...] →
The character string “(C) Sharp Corporation” is registered in the extraction information 44 as it is as the extracted character string 4A. As described above, when the entire parentheses are the symbol (C) representing copyright and the symbol (R) representing trademark registration (S306), the entire parentheses are deleted and only the character string "Sharp Corporation" is extracted as the extracted character string 4A. Register in information 44.

【０１４０】７：特定記号分割部次に、たとえば「高速の１００ＢＡＳＥ−ＴＸ／１０Ｂ
ＡＳＥ−Ｔ（ＲＪ４５コネクタ）ＬＡＮを備えた…」→
抽出文字列４Ａとして“１００ＢＡＳＥ−ＴＸ／１０Ｂ
ＡＳＥ−Ｔ”が得られた場合、並列句を表わすスラッシ
ュ記号（“／”）を含んだ場合には（Ｓ３０８）、特定
記号分割部７が適用される。なお、特定記号分割部７が
適用されるために前述の文字列抽出部４により、予め分
割すべき記号（“／”）が認識され各種のデータ８０と
して記憶されておく必要がある。7: Specific symbol division unit Next, for example, "High-speed 100BASE-TX / 10B
Equipped with ASE-T (RJ45 connector) LAN ... "→
"100BASE-TX / 10B" is used as the extracted character string 4A.
When ASE-T "is obtained, and when a slash symbol (" / ") representing a parallel phrase is included (S308), the specific symbol division unit 7 is applied. Therefore, it is necessary that the character string extraction unit 4 recognizes a symbol ("/") to be divided in advance and stores it as various data 80.

【０１４１】本実施の形態では、“／”の前後で単語を
分離した上で別々に抽出文字列４Ａとして抽出情報４４
に登録する（Ｓ３０９）。上述した文字列では、“１０
０ＢＡＳＥ−ＴＸ”と“１０ＢＡＳＥ−Ｔ”の２つの抽
出文字列４Ａが抽出情報４４として得られる。ここでは
“／”を挙げたが、たとえば“：”、“；”および
“−”などの分割記号も同様にして適用することができ
る。In the present embodiment, words are separated before and after “/”, and then extracted separately as extracted character strings 4A.
(S309). In the character string described above, “10
Two extracted character strings 4A of “0BASE-TX” and “10BASE-T” are obtained as the extraction information 44. Although “/” is mentioned here, for example, division of “:”, “;”, “-”, etc. Symbols can be applied in a similar manner.

【０１４２】８：数字列除去部次に文字列「Power Ｅ／Ｊ３．０はPower Ｅ／ＪＶ２．
１に比べ、…」→抽出文字列４Ａとして“Power Ｅ／Ｊ
３．０”、“Power Ｅ／ＪＶ２．１”のように末尾に数
字列を含んだ場合には（Ｓ３１０）、数字列除去部が適
用される。数字列除去部８が適用されるために前述した
文字列抽出部４では、該数字列除去部８が適用される数
字列および関連文字列（．、Ver、Ｖなど）を認識し各
種のデータ８０として予め登録しておく必要がある。8: Numeric string remover Next, the character string "Power E / J3.0 is replaced with Power E / JV2.
Compared to 1 ... "→ Extracted character string 4A as" Power E / J
When a digit string is included at the end, such as “3.0” or “Power E / JV2.1” (S310), the digit string removing unit is applied. The above-described character string extracting unit 4 needs to recognize the numeral string to which the numeral string removing unit 8 is applied and the related character strings (., Ver, V, etc.) and register them in advance as various data 80.

【０１４３】抽出された英語の抽出文字列４Ａが末尾部
分において数字列、たとえばバージョン情報を意味する
数字列（小数点を含んでもよい）の場合（Ｓ３１０）、
この数字列を含めて抽出文字列４Ａとして抽出するとバ
ージョン番号が変わるだけで見出し語３１として辞書引
きできなくなる。数字列を除去した抽出文字列４Ａで登
録することにより（Ｓ３１１）、辞書データ３０におけ
る見出し語３１の汎用性が高くなって辞書データ３０の
容量削減も図られる。When the extracted English extracted character string 4A is a number string at the end, for example, a number string meaning version information (may include a decimal point) (S310),
If this number string is extracted as the extracted character string 4A, it will not be possible to look up the dictionary as a headword 31 only by changing the version number. By registering with the extracted character string 4A from which the number string has been removed (S311), the versatility of the headword 31 in the dictionary data 30 is improved, and the capacity of the dictionary data 30 can be reduced.

【０１４４】上の例では、“Power Ｅ／Ｊ３．０”は
“Power Ｅ／ＪＶ２．１”の２つとも抽出文字列４Ａと
して登録するのではなく、抽出文字列４Ａとして“Powe
r Ｅ／Ｊ”だけが登録される。特に、後者の場合は、数
字列だけでなくバージョン情報を表わす“Ver.”、
“V.”、“Ver”の文字列が前接した“Ver.数字”や“V
数字”などもこの処理の対象にする事で除去できる。In the above example, “Power E / J3.0” is not registered as the extracted character string 4A for both “Power E / JV2.1”.
r E / J "only. In the latter case, in particular," Ver. "
"V.", "Ver."
Numerals and the like can also be removed by making them the targets of this processing.

【０１４５】９：レイアウト情報チェック部次に文字列「♯Home Page： −ＭＤスタジオ（ＭＤ−Ｆ２０）を発売… −液晶モニタ（ＬＬ−Ｔ１４２Ａ）を発売…」 →抽出文字列４Ａとして文字列“Home Page：＜改行＞
−ＭＤ” のように抽出文字列４Ａが英語文字列であって改行コー
ドなどを含んだ場合に適用されるレイアウト情報チェッ
ク部９について説明する。9: Layout information check section Next, the character string "@Home Page:-Released MD Studio (MD-F20) ...-Released liquid crystal monitor (LL-T142A) ..."-> Character string as extracted character string 4A Home Page: <Line break>
The layout information check unit 9 applied when the extracted character string 4A is an English character string and includes a line feed code or the like as in “−MD” will be described.

【０１４６】レイアウト情報チェック部９では抽出文字
列４Ａが文字列“Home Page：＜改行＞−ＭＤ”のよう
に改行コードなどの言語外情報であるレイアウト情報を
含んだ異なる行の文字列が合成されたものであると認識
されれば（Ｓ３１２）、改行コードの位置で該抽出文字
列４Ａを分離して、個別の抽出文字列４Ａとして抽出情
報４４に登録する（Ｓ３１３）。In the layout information check section 9, the extracted character string 4A is synthesized with a character string of a different line including layout information such as a character string "Home Page: <line feed>-MD" which is non-language information such as a line feed code. If the extracted character string 4A is recognized (S312), the extracted character string 4A is separated at the position of the line feed code and registered in the extraction information 44 as an individual extracted character string 4A (S313).

【０１４７】ここで、レイアウト情報としては改行コー
ドを挙げたが、ほかには段落に関する情報（空行または
インデント情報）、箇条書きに関する情報、フォントや
下線などの文字列を修飾する情報もあり、これらの情報
を同一文字列を分離する情報として利用することができ
る。Here, the line feed code is given as the layout information, but there are also information on paragraphs (blank lines or indentation information), information on bullet points, and information for modifying character strings such as fonts and underlines. These pieces of information can be used as information for separating the same character string.

【０１４８】なお、これらのレイアウト情報は、入力部
１で入力された文書データ４０を読取るときに分離され
るが、レイアウト情報チェック部９が適用される場合に
は保存しておく必要がある。Note that these pieces of layout information are separated when the document data 40 input by the input unit 1 is read, but need to be saved when the layout information check unit 9 is applied.

【０１４９】１０：単語数チェック部たとえば、文字列「野球ゲームPlayは、…」においては
文字列抽出部４により抽出文字列４Ａとして英語の文字
列“Play”が抽出されるが英単語１単語であった場合に
は単語数チェック部１０が適用される。10: Word Number Checking Unit For example, in the character string “baseball game Play is...”, The character string extracting unit 4 extracts the English character string “Play” as the extracted character string 4A, but one English word Is satisfied, the word number check unit 10 is applied.

【０１５０】抽出文字列４Ａが上述したように基本的な
英単語１語であった場合に、そのまま辞書データ３０に
見出し語３１として登録してしまうと製品名以外の本来
の訳（この例では「プレイ」）で使われている場合も訳
されなくなるだけでなく動詞で使われている大部分の場
合に品詞を誤って翻訳することになってしまい翻訳結果
が破綻する可能性が大きくなる。When the extracted character string 4A is one basic English word as described above, if it is registered as the headword 31 in the dictionary data 30 as it is, the original translation other than the product name (in this example, Not only is it not translated when used in "play"), but in most cases used in verbs, the part-of-speech is erroneously translated and the translation result is more likely to be broken.

【０１５１】そこで、本実施例では単語数チェック部１
０を適用して、単語と熟語の区分をし、１単語の場合
（Ｓ３１４）は、抽出文字列４Ａとして抽出情報４４に
登録しないなどの処理を行なって、上述の英単語“Pla
y”は登録されない。ここでは単語数チェック部１０は
２単語以上からなれば熟語と判定しているが文字列によ
る空白（タブ）があるか否かにより単語であるか否かを
判定してもよい。Therefore, in the present embodiment, the word number checking unit 1
0 is applied to distinguish between words and idioms, and in the case of one word (S314), processing such as not registering the extracted character string 4A in the extraction information 44 is performed, and the above-described English word “Pla
Here, “y” is not registered. Here, the word count checking unit 10 determines that the word is a idiom if it consists of two or more words, but determines whether the word is a word based on whether there is a blank (tab) due to a character string. Is also good.

【０１５２】１１：文字列長チェック部次に、たとえば文字列「新聞配信サービスNewspaperSel
fは、…」の場合文字列抽出部４により抽出される抽出
文字列４Ａ（“NewspaperSelf”）は１単語であるが長
いので、この場合には文字列長チェック部１１が適用さ
れる。前述した単語数チェック部１０の適用によりこの
１単語（“NewspaperSelf”）の抽出情報４４における
抽出文字列４Ａとしての登録が排除されると、これを用
いて作成される辞書データ３０を用いた翻訳装置の未知
語処理によってはNewspaperとSelfに分離して辞書引き
し、それぞれの訳（「新聞」と「自身」）を構成して単
語訳を生成する（「新聞−自身」）場合も想定されるの
で接尾語を含む可能性のあるような長い英語文字列は１
語であっても辞書データ３０中に登録されるようにする
のが望ましい（Ｓ３１６）。11: Character String Length Checking Unit Next, for example, the character string “newspaper distribution service NewspaperSel
In the case of “f is...”, the extracted character string 4A (“NewspaperSelf”) extracted by the character string extraction unit 4 is one word but long, so in this case, the character string length check unit 11 is applied. When the registration of the one word (“NewspaperSelf”) as the extracted character string 4A in the extraction information 44 by the application of the word number checking unit 10 described above is excluded, the translation using the dictionary data 30 created using this is performed. Depending on the unknown word processing of the device, it is also assumed that the dictionary is separated into Newspaper and Self, and the translations ("newspaper" and "self") are configured to generate word translations ("newspaper-self"). So a long English string that may contain a suffix is 1
It is desirable that even words are registered in the dictionary data 30 (S316).

【０１５３】１２：大小文字チェック部次に、文字列「当社の検索エンジンKeyWordは、…」に
おいて文字列抽出部４による抽出文字列４Ａの“KeyWor
d”のように大小文字が混在している場合には（Ｓ３１
７）、大小文字チェック部１２が適用される。12: Case Checking Unit Next, in the character string “Our search engine KeyWord is...”, The character string extracting unit 4 extracts the “KeyWor
If uppercase and lowercase letters are mixed as in “d” (S31
7), the case checker 12 is applied.

【０１５４】前述した単語数チェック部１０の適用によ
り上述の抽出文字列４Ａ“KeyWord”の抽出情報４４へ
の登録は排除されるが、この単語のように大小文字が混
在している名詞は、固有名詞である可能性が非常に高い
ので訳さない語として抽出情報４４に抽出文字列４Ａと
して登録するとともに辞書データ３０に見出し語３１と
して登録されるのが望ましい（Ｓ３１８）。逆にすべて
が小文字の場合は抽出文字列４Ａおよび見出し語３１と
して登録しないなど文字列を構成する文字の大小の違い
により登録すべきか否かを判定することも有用である。By applying the above-described word count checking unit 10, the registration of the above-mentioned extracted character string 4A “KeyWord” in the extraction information 44 is excluded. Since it is very likely that the word is a proper noun, it is desirable to register the word as an untranslated word in the extraction information 44 as the extracted character string 4A and to register it as the index word 31 in the dictionary data 30 (S318). Conversely, it is also useful to determine whether or not to register based on the size of the characters constituting the character string, such as not being registered as the extracted character string 4A and the headword 31 if all are lowercase.

【０１５５】１３：品詞チェック部次に、文字列「この性質が広くpropagateされる。」の
場合は、文字列抽出部４により抽出文字列４Ａとして動
詞として使われる英語文字列“propagate”が抽出され
るが、この場合には品詞チェック部１３が適用される。13: Part-of-speech checking unit Next, in the case of the character string "this property is widely propagated.", The character string extracting unit 4 extracts the English character string "propagate" used as a verb as the extracted character string 4A. However, in this case, the part-of-speech checking unit 13 is applied.

【０１５６】上述した説明では、抽出文字列４Ａとして
抽出される英語文字列を製品名のように名詞である場合
を中心に説明してきたが文書データ４０が専門文献の中
から引用されたデータである場合は名詞以外の英単語が
使われている場合があり、辞書データ３０に抽出文字列
４Ａを見出し語３１として登録しその品詞３２Ｂに名詞
を登録すると辞書データ３０を用いた翻訳結果が破綻す
る場合がある。ただし、上述のように日本語でも動詞
（「propagateされる」）の一部として用いられる場合
は、以下のように抽出された英語の抽出文字列４Ａ以降
に続く日本語を識別することで品詞を判定できて、誤っ
た辞書登録を回避できる。In the above description, the case where the English character string extracted as the extracted character string 4A is a noun such as a product name has been mainly described, but the document data 40 is data quoted from the technical literature. In some cases, English words other than nouns may be used. If the extracted character string 4A is registered as the headword 31 in the dictionary data 30 and the noun is registered in the part of speech 32B, the translation result using the dictionary data 30 will fail. May be. However, as described above, when a Japanese verb ("propagate") is used as a part of a verb ("propagate"), the part of speech is identified by identifying the Japanese that follows the English extracted character string 4A extracted as follows. Can be determined, and incorrect dictionary registration can be avoided.

【０１５７】つまり日本語の場合、非動詞の単語（名詞
や外来語）を動詞化する場合サ変動詞となるので抽出文
字列である英語文字列の後に続く日本語の文字列がさ、
し、す、する、しろ、せよのいずれかに該当する場合に
は抽出文字列４Ａは動詞であるから抽出文字列４Ａとし
て抽出情報４４に登録せずさらに辞書データ３０に見出
し語３１として登録されるのを回避できる（Ｓ３２
１）。In other words, in the case of Japanese, when a non-verb word (noun or foreign word) is converted to a verb, it becomes a sa-varib. Therefore, the Japanese character string following the English character string as the extracted character string is
If any of the following is true, the extracted character string 4A is a verb, and thus is not registered as the extracted character string 4A in the extraction information 44 and is further registered as the headword 31 in the dictionary data 30. Can be avoided (S32
1).

【０１５８】ここで、補助データ作成部２９について説
明する。以上述べた見出し語認識部５の各部は入力文書
データ４０中の抽出文字列４Ａ付近の文字列だけに基づ
いて誤った辞書登録を回避するような処理が可能である
が、このような処理を補助する補助データを参照するこ
とができればより適切な見出し語３１となる抽出文字列
４Ａの認識が可能となる。補助データ作成部２９は、見
出し語認識部５により参照されて見出し語認識処理を補
助するためのデータを作成するために、頻度情報作成部
２９０、辞書引き結果作成部２９１および翻訳結果作成
部２９２を有する。以下に、この補助データ作成部２９
を使った、頻度情報チェック部１４、辞書引きチェック
部１５および翻訳チェック部１６の処理を説明する。Here, the auxiliary data generator 29 will be described. Each unit of the headword recognizing unit 5 described above can perform processing to avoid erroneous dictionary registration based only on the character string near the extracted character string 4A in the input document data 40. If it is possible to refer to the auxiliary data to be assisted, it becomes possible to recognize the extracted character string 4A that becomes the more appropriate headword 31. The auxiliary data creation unit 29 includes a frequency information creation unit 290, a dictionary lookup result creation unit 291, and a translation result creation unit 292 in order to create data for assisting the headword recognition process with reference to the headword recognition unit 5. Having. Hereinafter, the auxiliary data creation unit 29
The processing of the frequency information check unit 14, the dictionary lookup check unit 15, and the translation check unit 16 using is described.

【０１５９】なお、補助データ作成部２９は見出し語認
識部５だけでなく後述する利用者支援部２２によっても
用いられる。The auxiliary data creation unit 29 is used not only by the headword recognition unit 5 but also by the user support unit 22 described later.

【０１６０】１４：頻度情報チェック部文書データ４０における文字列抽出部４の文字列抽出の
結果、文書データ４０中に一度しか出現していない英語
の抽出文字列４Ａは、その文書データ４０の作者が訳語
を用いるのを忘れていたり、文字列抽出部４の抽出処理
が誤っている（たまたま連続して現われた２つの英文字
列がまとめて認識されたなど）場合がある。逆に、何度
も出現して出現頻度４Ｂの高い抽出文字列４Ａは、抽出
文字列４Ａのまま訳語として用いても問題がないのは確
実である。14: Frequency Information Checking Unit As a result of extracting the character string by the character string extracting unit 4 in the document data 40, the English extracted character string 4A that appears only once in the document data 40 is determined by the author of the document data 40. May have forgotten to use the translated word, or the extraction process of the character string extraction unit 4 may be incorrect (for example, two English character strings that happen to appear consecutively are recognized together). Conversely, it is certain that the extracted character string 4A that appears many times and has a high appearance frequency 4B can be used as a translated word as it is without any problem.

【０１６１】そこで、頻度情報チェック部１４では頻度
情報作成部２９０から文字列抽出部４で抽出情報４４に
記録された抽出文字列４Ａに対応の出現頻度４Ｂを取得
し、所定回数以下しか出現していない抽出文字列４Ａは
辞書データ３０に見出し語３１として登録されないよう
にチェック、たとえば抽出情報４４から抽出文字列４Ａ
を削除する。Therefore, the frequency information check unit 14 acquires the appearance frequency 4B corresponding to the extracted character string 4A recorded in the extraction information 44 by the character string extraction unit 4 from the frequency information creation unit 290, and appears only a predetermined number of times or less. Check that the extracted character string 4A that has not been entered is not registered as a headword 31 in the dictionary data 30. For example, the extracted character string 4A is extracted from the extraction information 44.
Remove.

【０１６２】たとえば、前述のレイアウト情報チェック
部９で用いられた前後の行が接続した抽出文字列４Ａで
ある文字列“Home Page：＜改行＞−ＭＤ”は入力文書
４０内に１度しか現われずに対応の出現頻度４Ｂの値は
１であることから頻度情報チェック部１４により抽出情
報４４から排除することが可能となる（Ｓ３２３）。For example, the character string "Home Page: <line feed>-MD" which is the extracted character string 4A connected to the preceding and succeeding lines used in the above-described layout information check unit 9 appears only once in the input document 40. However, since the value of the corresponding appearance frequency 4B is 1, the frequency information check unit 14 can exclude the frequency from the extracted information 44 (S323).

【０１６３】１５：辞書引きチェック部上述したように、辞書データ３０を用いた翻訳結果に悪
影響を与えるような不適切な見出し語３１となるような
文字列４Ａの抽出は、辞書引きチェック部１５を利用し
てチェックすることでより確実に抽出情報４４から排除
することができる。たとえば、抽出情報４４中の抽出文
字列４Ａを辞書引き結果作成部２９１が辞書データ３
０、基本辞書データ６０および専門辞書データ７０を用
いた辞書引きにかけ（Ｓ３２４）、その結果によって
（Ｓ３２５）以下の処理を行なうことが可能となる。15: Dictionary Lookup Checking Unit As described above, the extraction of the character string 4A that becomes an inappropriate headword 31 that adversely affects the translation result using the dictionary data 30 is performed by the dictionary lookup checking unit 15 By using the check, the information can be more reliably excluded from the extracted information 44. For example, the dictionary lookup result creating unit 291 extracts the extracted character string 4A in the extracted information 44 from the dictionary data 3
0, the dictionary is searched using the basic dictionary data 60 and the specialized dictionary data 70 (S324), and according to the result, the following processing can be performed (S325).

【０１６４】つまり作成中の辞書データ３０に既に見出
し語３１として登録されていれば（Ｓ３２５でＹ）、該
抽出文字列４Ａを辞書データ３０に登録せずに抽出情報
４４における対応する出現頻度４Ｂだけをカウントアッ
プする。また、全く別の辞書、たとえば翻訳装置５０１
に適用する基本辞書データ６０や専門辞書データ７０中
に見出し語として存在し基本的な見出し語であれば（Ｓ
３２７でＹ）、辞書データ３０に見出し語３１としては
登録されないよう処理するが、基本的な見出し語でなけ
れば（Ｓ３２７でＮ）、辞書データ３０に見出し語３１
として登録されるように処理する。また、これらの辞書
引きに失敗すれば（Ｓ３２５でＮ）、辞書データ３０に
見出し語３１として新規に登録されるよう処理する（Ｓ
３２６）。That is, if the headword 31 has already been registered in the dictionary data 30 being created (Y in S325), the extracted character string 4A is not registered in the dictionary data 30 and the corresponding appearance frequency 4B in the extraction information 44 is not registered. Just count up. Also, a completely different dictionary, for example, the translation device 501
If it exists as a headword in the basic dictionary data 60 or the specialized dictionary data 70 applied to the
327), processing is performed so as not to be registered as the headword 31 in the dictionary data 30, but if the headword is not a basic headword (N in S327), the headword 31 is added to the dictionary data 30.
Process to be registered as If these dictionary lookups fail (N in S325), processing is performed so that the dictionary data 30 is newly registered as the headword 31 (S325).
326).

【０１６５】１６：翻訳チェック部さらに翻訳結果に悪影響を与えるような不適切な見出し
語３１が辞書データ３０に登録されないように翻訳チェ
ック部１６を利用して抽出情報４４をチェックすること
でより精度の高い辞書データ３０を作成できる。図１３
（Ａ）で示されるように、抽出文字列４Ａの前後に語句
を補充し文章とした上で翻訳結果作成部２９２で翻訳し
（Ｓ３２９）、誤った翻訳結果となれば（Ｓ３３０で
Ｎ）、該抽出文字列４Ａは抽出情報４４にそのまま登録
しておく（Ｓ３３２）。16: Translation Checking Unit Further, the extraction information 44 is checked by using the translation checking unit 16 so that an inappropriate entry word 31 that adversely affects the translation result is not registered in the dictionary data 30, thereby achieving higher accuracy. Dictionary data 30 with a high level FIG.
As shown in (A), words are supplemented before and after the extracted character string 4A to make a sentence, which is then translated by the translation result creation unit 292 (S329). If an incorrect translation result is obtained (N in S330), The extracted character string 4A is registered as it is in the extraction information 44 (S332).

【０１６６】抽出文字列４Ａがたとえば、文字列“Take
Out”である場合に、該文字列の前後に語句を補充した
文“This is Take Out.”というものを作成し、辞書デ
ータ３０、基本辞書データ６０および専門辞書データ７
０を用いて翻訳結果作成部２９２により翻訳させると
“これ、である、外へかかる”と破綻した翻訳結果にな
るから（Ｓ３３０でＮ）、この場合は翻訳チェック部１
６により該抽出文字列４Ａの“Take Out”は辞書データ
３０に見出し語３１として登録されるよう抽出情報４４
に抽出文字列４Ａとして登録処理する（Ｓ３３２）。ち
なみに該抽出文字列４Ａを辞書データ３０に見出し語３
１として登録させた上で辞書データ３０を用いて上述の
文を翻訳すると、“これは、Take Outである。”という
翻訳結果を得ることができて効果があることがわかる。The extracted character string 4A is, for example, a character string “Take
Out ”, a sentence“ This is Take Out. ”In which words are added before and after the character string is created, and the dictionary data 30, the basic dictionary data 60 and the specialized dictionary data 7
If the translation is made by the translation result creation unit 292 using 0, the translation result becomes “this is, it goes outside” (N in S330). In this case, the translation check unit 1
6 so that the “Take Out” of the extracted character string 4A is registered in the dictionary data 30 as the entry word
Is registered as an extracted character string 4A (S332). By the way, the extracted character string 4A is stored in the dictionary data 30 as the headword 3
When the above sentence is translated by using the dictionary data 30 after being registered as “1”, a translation result of “This is Take Out” can be obtained, which is effective.

【０１６７】また、翻訳結果チェック部１６では、翻訳
結果に基づいて、同一見出し語は重複して辞書データ３
０に登録されないように処理される。これにより、辞書
データ３０における登録データ量が必要以上に増加する
ことが回避されて、該辞書データ３０を利用した翻訳処
理時の辞書検索時間を短縮できる。その処理のために、
翻訳チェック部１６は、図１３（Ｂ）で示されるよう
に、抽出文字列４Ａの前後に語句を補充し文章とした上
で翻訳結果作成部２９２で翻訳し（Ｓ３２９）、翻訳結
果が該抽出文字列４Ａと同一であることを示すのであれ
ば（Ｓ３３０ａでＹ）、該抽出文字列４Ａは抽出情報４
４に登録しない（Ｓ３３１）。一方、同一の翻訳結果と
ならなければ（Ｓ３３０ａでＮ）、該抽出文字列４Ａは
抽出情報４４に登録される（Ｓ３３２）。In the translation result check section 16, the same headword is duplicated based on the translation result.
Processing is performed so as not to be registered in 0. As a result, it is possible to prevent the amount of registered data in the dictionary data 30 from unnecessarily increasing, and it is possible to reduce a dictionary search time during a translation process using the dictionary data 30. For its processing,
As shown in FIG. 13B, the translation check unit 16 supplements words before and after the extracted character string 4A to form a sentence, and translates the sentence using the translation result creation unit 292 (S329). If it indicates that it is the same as the character string 4A (Y in S330a), the extracted character string 4A
No. 4 (S331). If the same translation result is not obtained (N in S330a), the extracted character string 4A is registered in the extraction information 44 (S332).

【０１６８】具体的には、抽出文字列４Ａがたとえば、
文字列“YourBestChoice”という商品名である場合に、
該文字列の前後に語句を補充した文“This is. YourBes
tChoice”というものを作成し、辞書データ３０、基本
辞書データ６０および専門辞書データ７０を用いて翻訳
結果作成部２９２により翻訳させる。このとき、“これ
は、YourBestChoiceである”という翻訳結果が得られる
ならば（Ｓ３３０ａでＹ）、該抽出文字列４Ａの“Your
BestChoice”は辞書データ３０に既に見出し語３１とし
て登録されているのであるから、同一見出し語３１が重
複登録されるのを回避するために、該抽出文字列４Ａは
抽出情報４４に登録されない（Ｓ３３１）。Specifically, if the extracted character string 4A is, for example,
If the product name is the string "YourBestChoice"
The sentence "This is. YourBes" with words added before and after the character string
"tChoice" is created and translated by the translation result creating unit 292 using the dictionary data 30, the basic dictionary data 60, and the specialized dictionary data 70. At this time, a translation result "This is YourBestChoice" is obtained. If (Y in S330a), “Your” of the extracted character string 4A
Since “BestChoice” is already registered as a headword 31 in the dictionary data 30, the extracted character string 4A is not registered in the extraction information 44 in order to prevent the same headword 31 from being registered twice (S331). ).

【０１６９】図１４は、図１の辞書情報補正部１８の処
理フローチャートである。上述した辞書情報作成部１７
では、図９に示されたように訳語３２Ａとして見出し語
３１と同じ文字列を付与したが辞書情報補正部１８を用
いることによってそれ以外の訳語３２Ａを与えることが
可能となる。そのために、辞書情報補正部１８は訳語全
角化部１９、訳語情報抽出部２０および訳語情報統一部
２１を含む。以下にこれら各部の動作を説明する。FIG. 14 is a processing flowchart of the dictionary information correction unit 18 of FIG. Dictionary information creation unit 17 described above
In FIG. 9, the same character string as the headword 31 is given as the translated word 32A as shown in FIG. 9, but by using the dictionary information correction unit 18, other translated words 32A can be given. For this purpose, the dictionary information correction unit 18 includes a translated word full-width unit 19, a translated word information extraction unit 20, and a translated word information unifying unit 21. The operation of each of these units will be described below.

【０１７０】１９：訳語全角化部英日機械翻訳用の辞書データでは訳語３２Ａは一般に日
本語（全角文字）である。そこで、訳語全角化部１９で
は、訳語３２Ａとして見出し語３１の文字列を全角化し
て辞書データ３１に登録する（Ｓ４００）。図１５は、
図１４の訳語全角化部１９の処理結果の一例を示す図で
ある。19: Translated Word Double-byte Unit In the dictionary data for English-Japanese machine translation, the translated word 32A is generally Japanese (double-byte characters). Therefore, the translated word full-width unit 19 double-widths the character string of the headword 31 as the translated word 32A and registers it in the dictionary data 31 (S400). FIG.
FIG. 15 is a diagram illustrating an example of a processing result of the translated word full-width unit 19 of FIG. 14.

【０１７１】２０：訳語情報抽出部前述の括弧記号除去分離部６で例示した文字列「電子新
聞（Electronic Newspaper）と…」のように括弧で挟ま
れた英語文字列が抽出文字列４Ａと抽出された場合には
訳語情報抽出部２０が適用される。20: Translated word information extraction unit An English character string sandwiched between parentheses, such as the character string "Electronic Newspaper" shown in the parenthesis symbol removal separation unit 6, is extracted as the extraction character string 4A. If so, the translated word information extraction unit 20 is applied.

【０１７２】この例の文字列にあるように開き括弧以前
に全角文字列があるような文字列パターンであった場合
（Ｓ４０１）、開き括弧直前の全角文字列が括弧内部の
英文字列の訳語である場合が多い。このとき、前述の括
弧記号除去分離部６の適用により、文字列“Electronic
Newspaper”が抽出文字列４Ａとして抽出情報４４に登
録されるとともに見出し語３１として辞書データ３０に
登録されるが、本実施例では、さらに括弧記号の直前の
全角文字列（この場合“電子新聞”）も抽出されて、こ
の見出し語３１に対する訳語３２Ａとして辞書データ３
０に登録される（Ｓ４０２）。As shown in the character string of this example, when the character string pattern has a double-byte character string before the opening parenthesis (S401), the double-byte character string immediately before the opening parenthesis is a translation of the English character string inside the parenthesis. Often it is. At this time, the character string “Electronic
“Newspaper” is registered in the extraction information 44 as the extracted character string 4A and is registered in the dictionary data 30 as the headword 31. In the present embodiment, a double-byte character string immediately before the parenthesis symbol (in this case, “electronic newspaper”) ) Is also extracted, and the dictionary data 3
0 (S402).

【０１７３】また逆に括弧前が半角文字で括弧内部が全
角文字である場合、たとえば文字列「Electronic Newsp
aper（電子新聞）」の場合（Ｓ４０１でＹ）括弧内部を
訳語３２Ａとして辞書データ３０に登録する（Ｓ４０
２）。図１６は図１４の訳語情報抽出部２０の処理結果
の一例を示す図である。On the other hand, when the parentheses are half-width characters and the parentheses are full-width characters, for example, the character string “Electronic Newsp
aper (electronic newspaper) ”(Y in S401), the inside of the parentheses is registered in the dictionary data 30 as the translated word 32A (S40).
2). FIG. 16 is a diagram showing an example of the processing result of the translated word information extraction unit 20 of FIG.

【０１７４】２１：訳語情報統一部次に、同じ事物を表わす言葉なのに、大小文字の違い
や、空白やハイフン記号の有無などの微妙な違いがある
ためだけに、別の語（見出し語／訳語）として登録され
てしまう場合にこれを避けるために訳語情報統一２１が
適用される。21: Translation information unification unit Next, even though words that represent the same thing have subtle differences, such as differences in case and the presence or absence of a space or a hyphen symbol, another word (headword / translation word) is used. ) Is applied in order to avoid this in the case where it is registered as ()).

【０１７５】たとえば、製品名を表わすPower Ｅ／Ｊと
いう名称は、書き手によっては、また同じ書き手でも、
場合によってはPowerＥ／Ｊ，Power Ｅ／Ｊ、PowerＥＪ
のように微妙に異なる書き方をする場合が多い。そこ
で、本実施の形態では、大小文字の違いや空白やハイフ
ン記号の有無だけが異なる類似する見出し語３１を辞書
データ３０から検出して（Ｓ４０３）、検出されたすべ
ての類似見出し語３１について同じ訳語３２Ａが付与さ
れるように訳語情報統一部２１で処理する。これによ
り、作成される辞書データ３０は用語統一用の辞書とし
ても利用できる。その際、訳語３２Ａは最も出現頻度４
Ｂの高い抽出文字列４Ａに対応の見出し語３１に対する
訳語３２Ａで統一することも可能である（Ｓ４０４）。For example, the name Power E / J representing a product name may be different depending on the writer or even the same writer.
In some cases PowerE / J, PowerE / J, PowerEJ
In many cases, the writing style is slightly different. Therefore, in the present embodiment, similar headwords 31 differing only in the case, the presence or absence of a space or a hyphen are detected from the dictionary data 30 (S403), and the same is detected for all the detected similar headwords 31. The translated word information unifying unit 21 performs processing so that the translated word 32A is added. Thus, the created dictionary data 30 can also be used as a dictionary for term unification. At that time, the translated word 32A has the highest frequency of appearance 4
It is also possible to unify with the translated word 32A for the headword 31 corresponding to the extracted character string 4A having a high B (S404).

【０１７６】次に、利用者支援部２２の処理について説
明する。図１７は図１の利用者支援部２２による表示画
面の一例を示す図である。図１８は図１の利用者支援部
２２による表示画面の他の例を示す図である。図１７に
おいては利用者支援部２２の辞書情報修正部２３、頻度
情報作成指示部２６、辞書引き指示部２７および翻訳指
示部２８の処理において出力部２に表示される画面例が
示される。図１８においては適用規則指示部２４および
保留規則指示部２５の処理動作時の表示画面例が示され
る。Next, the processing of the user support unit 22 will be described. FIG. 17 is a diagram showing an example of a display screen by the user support unit 22 of FIG. FIG. 18 is a diagram showing another example of the display screen by the user support unit 22 of FIG. FIG. 17 shows an example of a screen displayed on the output unit 2 in the processing of the dictionary information correction unit 23, the frequency information creation instruction unit 26, the dictionary lookup instruction unit 27, and the translation instruction unit 28 of the user support unit 22. FIG. 18 shows an example of a display screen at the time of the processing operation of the application rule instruction unit 24 and the hold rule instruction unit 25.

【０１７７】上述してきたように辞書データ３０の作成
に必要な情報である見出し語３１および辞書情報３２は
すべて、抽出元となる複数種類の言語が混在した文書デ
ータ４０から自動的に生成するようにしていた。しかし
ながら、不適切な情報が辞書データ３０に登録すること
が起こり得る。そこで、辞書作成装置５００では自動的
に作成された辞書データ３０の内容を、利用者がチェッ
クして必要ならば修正するために利用者支援部２２が適
用される。図３のフローチャートに従い利用者支援部２
２の処理について説明する。As described above, all of the headwords 31 and the dictionary information 32, which are information necessary for creating the dictionary data 30, are automatically generated from the document data 40 in which a plurality of languages to be extracted are mixed. I was However, inappropriate information may be registered in the dictionary data 30. Therefore, in the dictionary creation device 500, the user support unit 22 is applied so that the user can check the contents of the automatically created dictionary data 30 and correct the contents if necessary. User support unit 2 according to the flowchart of FIG.
The process 2 will be described.

【０１７８】前述したようにして辞書データ３０がデー
タ記憶部３に作成されて登録されている状態で、入力部
１を介して辞書データ３０の修正が指示入力されると利
用者支援部２２の処理が開始される（Ｓ１０）。In the state where the dictionary data 30 has been created and registered in the data storage unit 3 as described above, when an instruction to modify the dictionary data 30 is input through the input unit 1, the user support unit 22 The process is started (S10).

【０１７９】まずユーザに対して出力部２を介して作成
された辞書データ３０の内容が表示される（Ｓ１１）。
そして、見出し語毎の内容修正であるかが問われる（Ｓ
１２）。ユーザが入力部１を介して見出し語３１ごとの
内容修正である旨を指示すると（Ｓ１２でＹＥＳ）、修
正のための支援データは必要かの問合せがなされる（Ｓ
１３）。ユーザが入力部１を介して支援データを不要と
入力すれば（Ｓ１３でＮＯ）、辞書情報修正部２３によ
り次のように辞書情報が修正される（Ｓ１５）。First, the contents of the dictionary data 30 created are displayed to the user via the output unit 2 (S11).
Then, it is asked whether the content is corrected for each headword (S
12). When the user indicates via the input unit 1 that the content is to be corrected for each headword 31 (YES in S12), an inquiry is made as to whether support data for correction is necessary (S12).
13). If the user inputs that the support data is unnecessary via the input unit 1 (NO in S13), the dictionary information is corrected by the dictionary information correcting unit 23 as follows (S15).

【０１８０】つまり、出力部２に表示された図１７の辞
書データ３０の内容を見て利用者が入力部１を介して修
正したい見出し語に関する情報（見出し語３１、訳語３
２Ａおよび品詞３２Ｂのいずれか）をクリックすれば、
辞書情報修正部２３により図１７に示されるようにクリ
ックされた情報のみが取出される。たとえば訳語３２Ａ
がクリックされている場合「訳語を修正して下さい。」
というタイトルのエディットボックス９５が表示されて
利用者はエディットボックス９５の内容を入力部１を操
作して修正することにより辞書情報修正部２３を介して
その修正内容が辞書データ３０に反映される。That is, the user looks at the contents of the dictionary data 30 shown in FIG. 17 displayed on the output unit 2 and obtains information on a headword that the user wants to correct via the input unit 1 (headword 31, translated word 3).
2A or 32B)
As shown in FIG. 17, only the information clicked by the dictionary information correcting unit 23 is extracted. For example, translation 32A
Is clicked "Please correct the translation."
Is displayed, and the user operates the input unit 1 to correct the contents of the edit box 95, and the corrected contents are reflected on the dictionary data 30 via the dictionary information correcting unit 23.

【０１８１】一方、見出し語毎の辞書データ３０の修正
において修正のための支援データが必要であると指示さ
れた場合は（Ｓ１３でＹＥＳ）、以下の処理（Ｓ１４）
が行なわれる。On the other hand, when it is instructed that the support data for the correction is required in the correction of the dictionary data 30 for each headword (YES in S13), the following processing (S14)
Is performed.

【０１８２】たとえば、ユーザの要求に応じて頻度情報
作成指示部２６は補助データ作成部２９の頻度情報作成
部２９０を介して文字列抽出時の出現頻度４Ｂを得て、
図１７に示される頻度９３として提示することで、対応
の見出し語９０を辞書データ３０として登録するか否か
の判断基準をユーザに与えることができる。For example, in response to a user's request, the frequency information creation instructing unit 26 obtains the appearance frequency 4B at the time of character string extraction via the frequency information creation unit 290 of the auxiliary data creation unit 29,
By presenting as the frequency 93 shown in FIG. 17, it is possible to give a user a criterion for determining whether or not to register the corresponding headword 90 as the dictionary data 30.

【０１８３】また翻訳装置５０１で辞書データ３０の利
用が想定されるときには、辞書引き指示部２７および翻
訳指示部２８がユーザにより指定されて所望される支援
データを提示することも可能である。When the translation device 501 uses the dictionary data 30, the dictionary lookup instructing unit 27 and the translation instructing unit 28 can be specified by the user to present desired support data.

【０１８４】たとえば、図１７では、参考情報参照ボタ
ンとして辞書引きボタン９６および翻訳ボタン９７の２
つが示される。表示される見出し語３１のいずれかがク
リックされて選択された状態で、辞書引きボタン９６が
押下されると、辞書引き指示部２７が補助データ作成部
２９の辞書引き結果作成部２９１に指示して指定された
見出し語３１をデータ記憶部３の既存の辞書データから
検索しその検索結果を支援データとして表示する。For example, in FIG. 17, two dictionary reference buttons 96 and a translation button 97 are provided as reference information reference buttons.
One is shown. When any one of the displayed headwords 31 is clicked and selected and the dictionary lookup button 96 is pressed, the dictionary lookup instruction unit 27 instructs the dictionary lookup result creation unit 291 of the auxiliary data creation unit 29. The designated headword 31 is searched from existing dictionary data in the data storage unit 3, and the search result is displayed as support data.

【０１８５】また、同様にして翻訳ボタン９７が押下さ
れた場合は、翻訳指示部２８により補助データ作成部２
９の翻訳結果作成部２９２が指示されて動作し指定され
た見出し語３１の翻訳結果が表示される。これらの辞書
引き結果または翻訳結果は結果ウィンドウ９８に表示さ
れる。図１７では翻訳ボタン９７が押下された場合の結
果ウィンドウ９８が示される。したがってユーザは結果
ウィンドウ９８の内容を参照することにより、対応する
見出し語３１を辞書データ３０に最終的に登録するか否
かを容易に判定できる。Similarly, when translation button 97 is pressed, translation instructing section 28 causes auxiliary data creating section 2 to operate.
Nine translation result creation units 292 are instructed to operate, and the translation results of the specified headword 31 are displayed. These dictionary lookup results or translation results are displayed in the result window 98. FIG. 17 shows a result window 98 when the translation button 97 is pressed. Therefore, the user can easily determine whether or not to finally register the corresponding headword 31 in the dictionary data 30 by referring to the contents of the result window 98.

【０１８６】一方、見出し語ごとの内容修正が指示され
なかった場合には（Ｓ１２でＮＯ）、見出し語認識部５
における各種の規則の適用についての修正するか否かの
問合せが行なわれる（Ｓ１６）。この場合規則の適用の
修正が指示されなければ一連の処理を終了する。規則の
適用の修正が指示されると（Ｓ１６でＹＥＳ）以下の処
理が行なわれる。On the other hand, when the content correction is not instructed for each headword (NO in S12), headword recognition unit 5
An inquiry is made as to whether or not to correct the application of the various rules in (S16). In this case, unless a modification of the application of the rule is instructed, the series of processes is terminated. When the application of the rule is instructed to be corrected (YES in S16), the following processing is performed.

【０１８７】前述した見出し語認識部５の各部の適用
を、ここでは規則の適用という。これらの規則は入力さ
れた文書データ４０の種類や作成された辞書データ３０
の利用目的により適用が適当である場合と不適当である
場合がある。たとえば文書中の数字列がバージョン番号
ではなく機種名の一部として用いられている製品につい
て書かれている文書データ４０である場合は、前述した
数字列除去部８は適用しない方がよい。したがってユー
ザがこのような判断基準で見出し語認識部５による適用
規則において適用が不適切であると判定すれば（Ｓ１７
でＹＥＳ）、適用規則指示部２４が起動される（Ｓ１
８）。The application of each section of the headword recognition section 5 described above is referred to as application of a rule. These rules are based on the type of the input document data 40 and the created dictionary data 30.
There are cases where the application is appropriate and cases where it is inappropriate depending on the purpose of use. For example, if the numeric string in the document is the document data 40 describing a product used as a part of the model name instead of the version number, it is better not to apply the numeric string removing unit 8 described above. Therefore, if the user determines that the application is inappropriate in the application rule by the headword recognition unit 5 based on such criteria (S17)
Is YES, the application rule instructing unit 24 is activated (S1).
8).

【０１８８】適用規則指示部２４は、図１８に示される
ようにウィンドウを出力部２に表示して、見出し語認識
部５における各種の規則を提示する。ここではたとえば
括弧記号除去分離部６、特定記号分割部７および単語数
チェック部１０による適用規則の一覧が規則一覧領域１
１２に提示され、これら各種の適用規則に対応して適用
するか否かをチェックする適用チェック領域１１０が設
けられる。この場合、ウィンドウに表示される規則は図
１８に示されるものに限定されない。The application rule designating section 24 displays a window on the output section 2 as shown in FIG. 18, and presents various rules in the headword recognition section 5. Here, for example, a list of application rules by the parentheses symbol removal / separation unit 6, the specific symbol division unit 7, and the word count check unit 10 is a rule list area 1.
12, an application check area 110 is provided for checking whether or not to apply in accordance with these various application rules. In this case, the rules displayed in the window are not limited to those shown in FIG.

【０１８９】適用チェック領域１１０において、たとえ
ば現在適用されている各規則には、対応の適用チェック
領域１１０の適用ボタンがチェック状態（●）で提示さ
れる。もし、ユーザがある規則の適用を望まなければ、
対応する適用チェック領域１１０の適用ボタンをクリッ
クし、チェック状態を解除すれば（○）、この規則が見
出し語認識部５において適用されるのを回避できる。In the application check area 110, for example, for each rule currently applied, the application button of the corresponding application check area 110 is presented in a checked state (●). If the user does not want to apply a rule,
By clicking the apply button in the corresponding application check area 110 and releasing the check state (○), it is possible to avoid applying this rule in the headword recognition unit 5.

【０１９０】一方図１８の表示ウィンドウに表示されて
いる規則の適用の有無が不適切でない場合は（Ｓ１７で
ＮＯ）、適用される規則に従って文書データ４０から抽
出される文字列４Ａに不適切な抽出文字列が生じる場合
があることをユーザが判定すれば（Ｓ１９でＹＥＳ）、
保留規則指示部２５が適用される（Ｓ２０）。On the other hand, if the presence or absence of the application of the rule displayed on the display window of FIG. 18 is not inappropriate (NO in S17), the character string 4A extracted from the document data 40 in accordance with the applied rule is inappropriate. If the user determines that an extracted character string may occur (YES in S19),
The suspension rule instructing unit 25 is applied (S20).

【０１９１】上述した規則を適用するか否かという適用
規則指示部２４による大まかな指示だけでなく、その規
則を用いて抽出された文字列である見出し語３１ごとに
利用者が独自に判定する必要性のある規則がある。そこ
で、保留規則指示部２５では見出し語認識部５の各規則
ごとに適用するか否かの指示だけではなく各規則を適用
して抽出された文字列である各見出し語３１ごとに、最
終的に辞書データ３０に登録するか否かをユーザが指示
できるように図１７の表示ウィンドウにおいて登録ボタ
ン９４を各見出し語３１ごとに付与することを指示する
ための登録ボタン領域１１１が設けられる。The user independently determines not only a rough instruction by the application rule instruction unit 24 as to whether or not to apply the above-mentioned rule but also each headword 31 which is a character string extracted using the rule. There are rules that need to be done. Therefore, the reservation rule instructing section 25 not only specifies whether or not to apply each rule of the headword recognizing section 5 but also finalizes each headword 31 which is a character string extracted by applying each rule. In the display window of FIG. 17, a registration button area 111 is provided for instructing that a registration button 94 be provided for each headword 31 so that the user can specify whether or not to register the dictionary data 30 in the dictionary data 30.

【０１９２】図１８に示されるように保留規則指示部２
５は、認識誤りの可能性が高い規則に対しては登録ボタ
ン９４を付与することでユーザに見出し語３１ごとに辞
書データ３０への最終登録を確認するよう促すために、
該規則を適用した登録を一旦保留することができる。As shown in FIG. 18, the reservation rule designating section 2
5 is to add a registration button 94 to a rule having a high possibility of recognition error to urge the user to confirm the final registration in the dictionary data 30 for each headword 31.
The registration to which the rule is applied can be temporarily suspended.

【０１９３】たとえば、図１７に示されるように見出し
語３１には登録ボタン９４が付与されるので、登録する
か否かを利用者が簡単な入力部１のクリック操作などで
指示可能になる。最初はチェック状態（●）で示され登
録ボタン９４が付与されるので、もしユーザが対応する
見出し語３１の辞書データ３０への最終的な登録を取り
やめにすることを所望すれば、登録ボタン９４を入力部
１を介してクリックしチェック状態を解除すれば
（○）、対応の見出し語３１は辞書データ３０に登録さ
れることはない。For example, as shown in FIG. 17, a registration button 94 is provided for the headword 31, so that the user can instruct whether or not to register by a simple click operation of the input unit 1. At first, a registration button 94 is given in a check state (●), so if the user wants to cancel the final registration of the corresponding headword 31 in the dictionary data 30, the registration button 94 is given. Is clicked through the input unit 1 to release the check state (○), the corresponding headword 31 is not registered in the dictionary data 30.

【０１９４】なお、先に述べた補助データ作成部２９に
おける各部の作成結果を各種のデータ８０としてデータ
記憶部３に格納しておき、利用者の要求に応じて出力部
２を介して提示し、利用者の最終的判断を得て見出し語
３１の辞書データ３０への登録の可否判断の指示を入力
することもできる。[0194] The results of the above-described creation of each unit in the auxiliary data creation unit 29 are stored in the data storage unit 3 as various data 80, and are presented via the output unit 2 in response to a user request. It is also possible to obtain a final decision of the user and input an instruction to determine whether or not to register the entry word 31 in the dictionary data 30.

【０１９５】また、利用者支援部２２の支援内容を利用
者がカスタマイズ可能とすることにより、利用者の負担
を軽減し、各利用者ごとに効率のよい辞書作成システム
を提供することもできる。Further, by allowing the user to customize the support contents of the user support section 22, the burden on the user can be reduced, and an efficient dictionary creation system can be provided for each user.

【０１９６】本実施の形態による辞書作成の一連の手順
は、辞書作成処理を機能させるための上述した各フロー
チャートに従うプログラムで実現される。このプログラ
ムはコンピュータで読取可能な記録媒体に格納される。
以下、これをプログラム記録媒体と呼ぶ。本実施の形態
では、このプログラム記録媒体として、図２に示される
ＣＰＵ４００で処理が行なわれるために必要な図示され
ないメモリ、たとえばＲＯＭのようなものそのものであ
ってもよい。また、図２の媒体アクセス装置３４はプロ
グラム読取装置としても機能して、そこに記録媒体３３
として当該プログラム記録媒体を挿入することで、読取
可能とされるものであってもよい。いずれの場合におい
ても、プログラム記録媒体に記録されるプログラムは、
ＣＰＵ４００によりアクセスされて読取実行される構成
であってもよい。あるいは、いずれの場合もプログラム
がプログラム記録媒体から読出されて読出されたプログ
ラムは、ＣＰＵ４００内の図示されないプログラム記憶
エリアに一旦ダウンロードされた後に、そのプログラム
がＣＰＵ４００により実行される方式であってもよい。
なお、このダウンロード用のプログラムは予め図２の本
体装置に格納されているものとする。A series of procedures for creating a dictionary according to the present embodiment is realized by a program according to the above-described flowcharts for causing the dictionary creation processing to function. This program is stored in a computer-readable recording medium.
Hereinafter, this is referred to as a program recording medium. In the present embodiment, the program recording medium may be a memory (not shown) required for the CPU 400 shown in FIG. 2 to perform the processing, such as a ROM itself. In addition, the medium access device 34 of FIG.
May be readable by inserting the program recording medium. In any case, the program recorded on the program recording medium is
It may be configured to be read and executed by being accessed by the CPU 400. Alternatively, in any case, the program may be read from the program recording medium, and the read program may be temporarily downloaded to a program storage area (not shown) in CPU 400 and then executed by CPU 400. .
It is assumed that the download program is stored in the main unit in FIG. 2 in advance.

【０１９７】ここで、上述したプログラム記録媒体は、
本体装置と分離可能に構成される記録媒体であり、磁気
テープやカセットテープなどのテープ系、フレキシブル
ディスクやハードディスクなどの磁気ディスク系、ＣＤ
・ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤなどの光ディスクのディ
スク系、ＩＣカード（メモリカードを含む）／光カード
などのカード系、マスクＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲ
ＯＭおよびフラッシュＲＯＭなどによる半導体メモリを
含めた固定的にプログラムを保持する媒体系のいずれで
あってもよい。Here, the above-mentioned program recording medium is
A recording medium that is configured to be separable from the main unit, such as a tape system such as a magnetic tape or a cassette tape, a magnetic disk system such as a flexible disk or a hard disk, or a CD.
-Disk system of optical disks such as ROM / MO / MD / DVD, card system such as IC card (including memory card) / optical card, mask ROM, EPROM, EEPROM
Any of a fixed system including a OM and a semiconductor memory such as a flash ROM for holding a program may be used.

【０１９８】また本実施の形態においては図２の装置は
入力装置１００を用いてインターネットを含む通信ネッ
トワークと接続可能な構成であることから、プログラム
記録媒体は通信ネットワークから当該プログラムを図２
の装置にダウンロードするように流動的にプログラムを
保持する媒体であってもよい。なお、このように通信ネ
ットワークからプログラムがダウンロードされる場合に
は、そのダウンロード用プログラムは予め図２の装置本
体のメモリに格納されるか、あるいは別の記録媒体から
インストールされるようにしてもよい。In the present embodiment, since the apparatus shown in FIG. 2 can be connected to a communication network including the Internet using the input device 100, the program recording medium stores the program from the communication network in FIG.
It may be a medium that holds the program fluidly so as to be downloaded to the device. When the program is downloaded from the communication network as described above, the download program may be stored in the memory of the apparatus main body in FIG. 2 in advance, or may be installed from another recording medium. .

【０１９９】なお、上述のプログラム記録媒体に格納さ
れる内容としてはプログラムに限定されずデータが含ま
れてもよい。The contents stored in the above-mentioned program recording medium are not limited to programs, but may include data.

【０２００】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

[Brief description of the drawings]

【図１】この発明の実施の形態による辞書作成装置の
機能構成を示す図である。FIG. 1 is a diagram showing a functional configuration of a dictionary creation device according to an embodiment of the present invention.

【図２】この発明の実施の形態による辞書作成装置の
ハードウェア構成を示す図である。FIG. 2 is a diagram showing a hardware configuration of a dictionary creation device according to the embodiment of the present invention.

【図３】図１の各機能を用いた辞書作成のための概略
処理手順を示すフローチャートである。FIG. 3 is a flowchart showing a schematic processing procedure for creating a dictionary using each function of FIG. 1;

【図４】図２のデータ記憶部の記憶内容の一例を示す
図である。FIG. 4 is a diagram showing an example of contents stored in a data storage unit of FIG. 2;

【図５】図４の入力文書データの一例を示す図であ
る。FIG. 5 is a diagram showing an example of the input document data of FIG. 4;

【図６】図１の文字列抽出部の処理フローチャートで
ある。FIG. 6 is a processing flowchart of a character string extracting unit in FIG. 1;

【図７】図６の処理により得られる抽出情報を示す図
である。FIG. 7 is a diagram showing extracted information obtained by the processing of FIG. 6;

【図８】図１の辞書情報作成部の処理フローチャート
である。FIG. 8 is a processing flowchart of a dictionary information creating unit of FIG. 1;

【図９】図８の処理により得られる辞書データの一例
を示す図である。FIG. 9 is a diagram illustrating an example of dictionary data obtained by the processing of FIG. 8;

【図１０】図１の辞書作成装置で作成された辞書デー
タを翻訳用に参照される辞書データの一部として利用す
る翻訳装置のブロック構成図である。10 is a block diagram of a translation device that uses the dictionary data created by the dictionary creation device of FIG. 1 as a part of dictionary data referred to for translation.

【図１１】図１の見出し語認識部の処理フローチャー
トである。FIG. 11 is a processing flowchart of a headword recognition unit of FIG. 1;

【図１２】図１の見出し語認識部の処理フローチャー
トである。FIG. 12 is a processing flowchart of a headword recognition unit of FIG. 1;

【図１３】（Ａ）と（Ｂ）は、図１の見出し語認識部
の処理フローチャートである。FIGS. 13A and 13B are processing flowcharts of the headword recognition unit of FIG. 1;

【図１４】図１の辞書情報補正部の処理フローチャー
トである。FIG. 14 is a processing flowchart of a dictionary information correction unit in FIG. 1;

【図１５】図１４の訳語全角化部の処理結果の一例を
示す図である。15 is a diagram illustrating an example of a processing result of the translated word full-width unit of FIG. 14;

【図１６】図１４の訳語情報抽出部の処理結果の一例
を示す図である。FIG. 16 is a diagram illustrating an example of a processing result of the translated word information extraction unit in FIG. 14;

【図１７】図１の利用者支援部による表示画面の一例
を示す図である。17 is a diagram showing an example of a display screen by the user support unit of FIG.

【図１８】図１の利用者支援部による表示画面の他の
例を示す図である。18 is a diagram showing another example of the display screen by the user support unit of FIG.

[Explanation of symbols]

１入力部、２出力部、３データ記憶部、４文字
列抽出部、５見出し語認識部、１７辞書情報作成
部、１８辞書情報補正部、２２利用者支援部、２９
補助データ作成部、３０辞書データ、３３外部記
憶媒体、３４媒体アクセス装置、４０入力文書デー
タ、４４抽出情報、５０規則データ、５００辞書
作成装置、５０１翻訳装置。1 input section, 2 output section, 3 data storage section, 4 character string extraction section, 5 headword recognition section, 17 dictionary information creation section, 18 dictionary information correction section, 22 user support section, 29
Auxiliary data creation unit, 30 dictionary data, 33 external storage medium, 34 medium access device, 40 input document data, 44 extraction information, 50 rule data, 500 dictionary creation device, 501 translation device.

Claims

[Claims]

An existing dictionary used for translating from a first language represented by a first type of character to a second language represented by a second type of character is not included in the existing dictionary. A dictionary creation device for creating dictionary information in which a registered headword is registered, comprising: an input unit for inputting information including a document in which a plurality of types of characters are mixed; and a document input from the input unit. A character string extracting unit that extracts one or more character strings composed of the first type of character and outputs extracted information in which the extracted character strings are registered; and a character string extracting unit that extracts the extracted information output from the character string extracting unit. A dictionary information creating unit that creates the dictionary information by using the extracted character string as the entry word and adding information about the entry word as entry word information corresponding to each of the entry words.

2. The one or more processing units for adjusting extraction of a character string so that the extracted character string is appropriate to be registered as the headword in the dictionary information. Having an adjustment processing unit having
The dictionary creation device according to claim 1.

3. The dictionary creation device according to claim 2, wherein the adjustment processing unit includes a parenthesis symbol processing unit that processes a character string before and after the parenthesis symbol in the extracted character string.

4. The adjustment processing unit includes a specific symbol division processing unit that divides the extracted character string before and after a specific symbol in the extracted character string to obtain the individual extracted character strings. 3. The dictionary creation device according to 3.

5. The dictionary creation device according to claim 2, wherein the adjustment processing unit includes a number string removal processing unit that removes a number string in the extracted character string.

6. The layout processing unit according to claim 1, wherein the adjustment processing unit includes a layout information processing unit that divides the extracted character string into individual extracted character strings based on information indicating a layout related to the document in the extracted character string. 6. The dictionary creation device according to any one of 2 to 5.

7. The adjustment processing unit, when the number of words in the extracted character string is equal to or less than a predetermined number,
7. The dictionary creation device according to claim 2, further comprising a word number processing unit that suppresses registration of the extracted character string in the extraction information.

8. The method according to claim 1, wherein the adjusting unit is configured to register the extracted character string in the extraction information when a length of the extracted character string including the predetermined number of words or less is equal to or less than a predetermined length. The dictionary creation device according to claim 7, further comprising a character string length processing unit for suppressing.

9. The method according to claim 1, wherein when the uppercase and lowercase characters are mixed in the extracted character string including the predetermined number of words or less, the adjustment processing unit keeps registration of the extracted character string in the extracted information. 9. The dictionary creation device according to claim 7, further comprising a character processing unit.

10. The adjustment processing unit determines a part of speech of the extracted character string based on a character string in the document subsequent to the extracted character string, and determines the extracted information according to whether the extracted character string corresponds to a predetermined part of speech. The dictionary creation device according to any one of claims 2 to 9, further comprising a part-of-speech processing unit that determines whether to delete the registration of the extracted character string in (1).

11. The dictionary creation device further comprises an auxiliary data creation unit that creates auxiliary data used to assist in creating the dictionary information, wherein the adjustment processing unit is configured to execute the auxiliary data creation unit 3. The adjustment processing is performed with reference to the created auxiliary data.
11. The dictionary creation device according to any one of claims 10 to 10.

12. The adjustment processing unit, wherein the auxiliary data creation unit includes a frequency information creation unit that counts the number of times the character string is extracted from the document for each of the extracted character strings and creates frequency information. The extracted character string in which the number of extractions is determined to be less than a predetermined number based on the frequency information created by the frequency information creating unit includes a frequency information processing unit that deletes the extracted information from the registration of the extracted information. The dictionary creation device according to claim 11.

13. The adjustment data creating unit, further comprising: a dictionary lookup result creating unit configured to search the extracted character string from a dictionary group including the dictionary information and the existing dictionary to create a search result; A dictionary lookup unit that suppresses registration of the extracted string in the extraction information when the search result created by the dictionary lookup result creation unit indicates that the extracted string is registered in the dictionary group. The dictionary creation device according to claim 11, further comprising a processing unit.

14. The translation data creation unit according to claim 11, wherein the auxiliary data creation unit includes a translation result creation unit that translates the extracted character string with reference to the dictionary group to create a translation result. Dictionary creation device.

15. The adjustment processing unit includes a first translation processing unit that maintains registration of the extracted character string in the extraction information when the translation result created by the translation result creation unit is not appropriate. , Claim 1
4. The dictionary creation device according to 4.

16. When the translation result created by the translation result creation unit indicates the extracted character string, the adjustment processing unit suppresses registration of the extracted character string in the extraction information. Comprising a processing unit,
The dictionary creation device according to claim 14.

17. The dictionary creation device according to claim 1, wherein the dictionary information creation unit includes a dictionary information correction unit for correcting the dictionary information.

18. The headword information includes a translation of the corresponding headword, and the dictionary information correction unit converts the character of the headword corresponding to the translation into a full-width character and registers it. The dictionary creation device according to claim 17, comprising:

19. The headword information includes a translation of a corresponding headword, and the dictionary information correction unit includes a translation word determination unit that determines and adds the translation of the headword from the document. Or the dictionary creation device according to 18.

20. The headword information includes a translation of the corresponding headword, and the dictionary information correction unit is configured to determine whether a character string is similar in at least one of the headwords in the dictionary information. 20. The dictionary creation apparatus according to claim 17, further comprising a translated word information unifying unit for assigning the same translated word to the one or more headwords.

21. An information correction unit for correcting the contents of the dictionary information created by the dictionary information creation unit as desired in accordance with information input from the outside via the input unit, The dictionary creation device according to claim 1.

22. The dictionary according to claim 21, wherein the information correction unit includes an external designation unit for designating the contents of the adjustment processing by the adjustment processing unit in accordance with an external instruction via the input unit. Creating device.

23. The external designation unit variably designates whether or not to apply the one or more processing units of the adjustment processing unit in accordance with an external instruction via the input unit.
The dictionary creation device according to claim 22.

24. The information correction unit, for each of the one or more processing units of the adjustment processing unit, finally converts the headword in the dictionary information to which the processing unit is applied, into the dictionary information 3. A registration determining unit for determining whether or not to register in the input unit via the input unit.
24. The dictionary creation device according to any one of 1 to 23.

25. The dictionary creating apparatus according to claim 22, wherein the creation data by the auxiliary data creation unit is presented at the time of designation by the external designation unit.

26. An existing dictionary used for translating from a first language represented by a first type of character to a second language represented by a second type of character, which is not used in the translation. A dictionary creation device for creating dictionary information in which a registered headword is registered, comprising: an input unit for inputting information including a document in which a plurality of types of characters are mixed; and a document input from the input unit. A character string extracting unit that extracts one or more character strings composed of the first type of character and outputs extracted information in which the extracted character strings are registered; and a character string extracting unit that extracts the extracted information output from the character string extracting unit. A dictionary information creating unit that creates the dictionary information by using the extracted character string as the entry word and assigning information about the entry word as entry word information corresponding to each of the entry words; Said letter The content of the information, and an information correction unit for correcting as desired in accordance with information inputted from the outside through the input section, a dictionary creation device.

27. The character string extraction unit includes an adjustment processing unit for adjusting extraction of a character string so that the extracted character string is appropriate to be registered as the headword in the dictionary information. The dictionary creation device according to claim 26.

28. The information processing apparatus according to claim 27, wherein the information correction unit includes an external designation unit for designating the content of the adjustment processing by the adjustment processing unit in accordance with an external instruction via the input unit. Dictionary creation device.

29. The dictionary creation device further includes an auxiliary data creation unit that creates auxiliary data used to assist in creating the dictionary information, wherein the adjustment processing unit is configured to execute the auxiliary data creation unit 3. The adjustment processing is performed with reference to the created auxiliary data.
The dictionary creation device according to 7 or 28.

30. The dictionary creating apparatus according to claim 29, wherein at the time of designation by said external designation section, data created by said auxiliary data creation section is presented.

31. A translation device for performing translation processing from a language represented by a first type of character to a language represented by a second type of character, wherein the translation device is extracted from an existing dictionary and a document in which a plurality of types of characters are mixed. One or more headwords that are not registered in the existing dictionary, and dictionary information having information on the headword as headword information for each of the headwords is stored. A translation device, comprising: a dictionary storage unit configured to refer to information in the dictionary storage unit for the translation process.

32. A dictionary creation method for creating dictionary information used for translating a first language represented by a first type of character into a second language represented by a second type of character, comprising: An input step for inputting information including a document in which characters are mixed, and the first step based on the document input from the input step
A character string extracting step of extracting one or more character strings composed of different types of characters and outputting extracted information in which the extracted character strings are registered; and the extracted characters of the extracted information output from the character string extracting step A dictionary information creating step of creating dictionary information by assigning information relating to the headword as headword information corresponding to each of the headwords in a column.

33. A computer-readable recording medium storing a dictionary creation program for causing a computer to execute a dictionary creation method for creating dictionary information used for translating from a first language to a second language. The dictionary creation method is a dictionary creation method for creating dictionary information used for translating a first language represented by a first type of character into a second language represented by a second type of character. An input step for inputting information including a document in which different types of characters are mixed; and the first step based on the document input from the input step.
A character string extracting step of extracting one or more character strings composed of different types of characters and outputting extracted information in which the extracted character strings are registered; and the extracted characters of the extracted information output from the character string extracting step A dictionary creating program for creating dictionary information by providing a column as a headword and providing dictionary information by providing information on the headword as headword information corresponding to each of the headwords. Possible recording medium.