[go: up one dir, main page]

IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Robust Speech Processing in Realistic Environments
Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
The Institute of Electronics, Information and Communication Engineers">Tobias CINCAREK The Institute of Electronics, Information and Communication Engineers">Hiromichi KAWANAMI The Institute of Electronics, Information and Communication Engineers">Ryuichi NISIMURA The Institute of Electronics, Information and Communication Engineers">Akinobu LEE The Institute of Electronics, Information and Communication Engineers">Hiroshi SARUWATARI The Institute of Electronics, Information and Communication Engineers">Kiyohiro SHIKANO
Author information
JOURNAL FREE ACCESS

2008 Volume E91.D Issue 3 Pages 576-587

Details
Abstract

In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the system's performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i. e. speech recognizer and the response generation module, is assessed. Although depending on the system's modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15k utterances for training the acoustic model, 40-50k utterances for training the language model and 40k-50k utterances for compiling the question and answer database. The Q & A database was most important for improving the system's response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.

Content from these authors
© 2008 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top