Skip to main content

    Hitoshi Isahara

    Question answering task is now being done in TREC8 using English documents. We examined question answering task in Japanese sentences. Our method selects the answer by matching the question sentence with knowledge-based data written in... more
    Question answering task is now being done in TREC8 using English documents. We examined question answering task in Japanese sentences. Our method selects the answer by matching the question sentence with knowledge-based data written in natural language. We use syntactic information to obtain highly accurate answers.
    In recent years, various types of tagged corpora have been constructed and much research using tagged corpora has been done. However, tagged corpora contain errors, which impedes the progress of research. Therefore, the correction of... more
    In recent years, various types of tagged corpora have been constructed and much research using tagged corpora has been done. However, tagged corpora contain errors, which impedes the progress of research. Therefore, the correction of errors in corpora is an important research issue. In this study we investigate the correction of such errors, which we call corpus correction. Using machine-learning methods, we applied corpus correction to a verb modality corpus for machine translation. We used the maximum-entropy and decision-list methods as machine-learning methods. We compared several kinds of methods for corpus correction in our experiments, and determined which is most effective by using a statistical test. We obtained several noteworthy findings: (1) Precision was almost the same for both detection and correction, so it is more convenient to do both correction and detection, rather than detection only. (2) In general, the maximum-entropy method worked better than the decision-list method; but the two methods had almost the same precision for the top 50 pieces of extracted data when closed data was used. (3) In terms of precision, the use of closed data was better than the use of open data; however, in terms of the total number of extracted errors, the use of open data was better than the use of closed data. Based on our analysis of these results, we developed a good method for corpus correction. We confirmed the effectiveness of our method by carrying out experiments on machine translation. As corpus-based machine translation continues to be developed, the corpus correction we discuss in this article should prove to be increasingly significant.
    ... optimization. Math. Programming.(1989) 45: 503–528 9. Sornlertlamvanich, V., Charoenporn, T., Isahara, H.: ORCHID: Thai Part-Of-Speech Tagged Corpus. Technical Report TR-NECTEC-1997-001, NECTEC. (1997) 10. Uchimoto ...
    Dividing a lecture speech into segments and providing those segments as learning objects are quite general and convenient way to construct e-learning resources. However it is difficult to assign an appropriate title to each object that... more
    Dividing a lecture speech into segments and providing those segments as learning objects are quite general and convenient way to construct e-learning resources. However it is difficult to assign an appropriate title to each object that reflects its content. Since there are ...
    ABSTRACT This paper presents our work on acquiring translational equivalence from a Japanese-Chinese parallel corpus. We follow and extend existing word alignment techniques, including statistical model and heuristic model, in order to... more
    ABSTRACT This paper presents our work on acquiring translational equivalence from a Japanese-Chinese parallel corpus. We follow and extend existing word alignment techniques, including statistical model and heuristic model, in order to achieve a high performance. In addition to the statistics of the parallel corpus, the lexical knowledge of the language pair, such as orthographic cognates and bilingual dictionary are exploited. The implemented aligner is applied to the annotation of word alignment in the parallel corpus and the evaluation is conducted also. The experimental results prove the usability of the aligner in our task.
    ABSTRACT Expansion of imagination is crucial for lively creativity. However, such expansion is sometimes rather difficult and an environment which supports creativity is required. Because people can attain higher creativity by using words... more
    ABSTRACT Expansion of imagination is crucial for lively creativity. However, such expansion is sometimes rather difficult and an environment which supports creativity is required. Because people can attain higher creativity by using words with a thematic relation rather than words with a taxonomical relation, we tried to extract word lists having thematic relations among words. We first extracted word lists from domain specific documents by utilizing inclusive relations between words based on a modifiee/modifier relationship in documents. Next, from the extracted word lists, we removed the word lists having taxonomical relations so as to obtain only word lists having thematic relations. Finally, based on the assumption what kind of knowledge a person can associate when he/she looks at a set of words correlates with how the word set is effective in creativity support, we examined whether the word lists direct us to informative pages on the Web for verifying the availability of our extracted word lists.
    Research on natural language processing (NLP) started with so-called rule-based methodology, however, compilation of huge amount of grammar rules and dictionary entries are too difficult to develop practical systems. Then, trend of NLP... more
    Research on natural language processing (NLP) started with so-called rule-based methodology, however, compilation of huge amount of grammar rules and dictionary entries are too difficult to develop practical systems. Then, trend of NLP research shifted to corpus-based, or statistical systems. Thanks to the rapid improvement of computer power and data storage, nowadays we can utilize huge amount of actual linguistic data. Combining such linguistic resources and high quality language analyzer, we can extract useful linguistic information and develop practical systems for specific domain. However, the future direction of NLP is still not obvious. Fusion of knowledge and example, or knowledge processing using linguistic resources, is one of the possibilities to develop high-performance NLP systems. As for the research target, machine translation with new paradigm and information retrieval as practical tasks are promising. To realize the fusion of knowledge and example, we try to make a computer system that utilizes linguistic knowledge of different degrees of abstraction as humans do, to make a model of human language function based on the system, and to acquire knowledge on how do humans store and use this kind of knowledge in their minds.
    In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation... more
    In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).
    本研究は,日本語のための意味役割タグ体系を定義するために黒田・井佐原 [32]が提唱した枠組みを 発展させ,(i)意味フレームによって記述された知識構造 (の断片)に自然言語表現を効果的に結びつける ための手法を “複層的意味フレーム分析” (Multilayered Semantic Frame Analysis: MSFA) の名称で提唱 し,(ii)それによって「言語と知識との結びつけ問題」の部分的解決法を提案する. Based on our previous... more
    本研究は,日本語のための意味役割タグ体系を定義するために黒田・井佐原 [32]が提唱した枠組みを 発展させ,(i)意味フレームによって記述された知識構造 (の断片)に自然言語表現を効果的に結びつける ための手法を “複層的意味フレーム分析” (Multilayered Semantic Frame Analysis: MSFA) の名称で提唱 し,(ii)それによって「言語と知識との結びつけ問題」の部分的解決法を提案する. Based on our previous work (Kuroda & Isahara [32]), we propose a method, called Multilayered Semantic Frame Analysis (MSFA) that links natural language expressions to semantic knowledge. This way, MSFA forms a basis for “semantic role tagging”, which is required for our development of Japanese corpus annotated for “semantic roles”, thereby offering an answer to the “language-knowledge linking problem”.
    Research Interests:
    Research Interests:

    And 523 more