JP2007164732A

JP2007164732A - Computer executable program and information processing device

Info

Publication number: JP2007164732A
Application number: JP2005364057A
Authority: JP
Inventors: Yoshiharu Asai; 芳治浅井
Original assignee: CRESCENT KK
Current assignee: CRESCENT KK
Priority date: 2005-12-16
Filing date: 2005-12-16
Publication date: 2007-06-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology, which allows addition of a voice recognition function to a program comprising a user interface without changing the program. <P>SOLUTION: An information processing device is equipped with: a means for retrieving screen display definition information which includes a user interface component defined by component information for the user interface and defines a screen display form of a first part formed on a computer screen; a means for displaying a second screen part including the user interface component; a means for storing a table of utterance components, which associates the character string information specified by an utterance with the component information; a means for acquiring the character string information generated by receiving the utterance; a means for specifying a user interface component corresponding to the character strings being generated; and a means for executing a process corresponding to the user interface component. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザインターフェースに音声認識機能を組み込むためのコンピュータ実行可能なプログラムおよびそのプログラムを搭載した情報処理装置に関する。 The present invention relates to a computer-executable program for incorporating a voice recognition function into a user interface and an information processing apparatus equipped with the program.

従来、ユーザインターフェース、例えば、ウェブ上で利用可能なアプリケーションプログラムに音声認識機能を組み込む技術としては、例えば、下記特許文献１が知られている。この技術では、ネットワーク上にウェブサーバと音声サーバとを設け、ウェブサーバと音声サーバとが互いに状態を通知し同期を取ることによって、ネットワーク上の端末からの音声によるアクセスとウェブアクセスとのシームレスな連携を図っている。 Conventionally, as a technique for incorporating a voice recognition function into a user interface, for example, an application program that can be used on the web, for example, the following Patent Document 1 is known. In this technology, a web server and a voice server are provided on a network, and the web server and the voice server notify each other of the status and synchronize with each other, thereby seamlessly performing voice access and web access from a terminal on the network. We are trying to cooperate.

しかし、このような構成では、ネットワークでウェブサーバと音声サーバとが互いに状態を通知し同期を取るための複雑なシステムが必要となる。そのため、既存のユーザインターフェースにおいてより簡略に音声によるアクセスを可能とするためには、端末自体に音声認識機能を設ければよい。そして、端末上で音声から文字列への変換と、変換された文字列のユーザインターフェースへの入力とを実行すればよい。 However, in such a configuration, a complicated system is required for the web server and the voice server to notify each other and synchronize with each other in the network. Therefore, in order to enable simple voice access in an existing user interface, a voice recognition function may be provided in the terminal itself. Then, the conversion from the voice to the character string and the input of the converted character string to the user interface may be executed on the terminal.

しかしながら、通常の画面を通じてのユーザインターフェースに、音声認識機能を追加するためには、音声認識エンジン（例えば、非特許文献１参照）を端末にインストールするとともに、ユーザインターフェースを構成するコンピュータプログラム（以下、単にプログラムという）に、音声認識エンジンから認識された情報を取得するためのインターフェース部分を設ける必要がある。 However, in order to add a speech recognition function to a user interface through a normal screen, a speech recognition engine (see, for example, Non-Patent Document 1) is installed in a terminal, and a computer program (hereinafter, referred to as a user interface) that configures the user interface. It is necessary to provide an interface portion for acquiring information recognized from the speech recognition engine in the program).

このようなインターフェース部分をプログラムに設けるためには、通常は、ソースプログラムの改造、および再コンパイルが必要となる。すなわち、音声認識エンジンとのインターフェースを組み込んだプログラムの新たな開発が必要となる。したがって、すでに、エンドユーザに配布済みのプログラムに対して、バージョンアップなしに音声認識機能を追加することは、現状の技術では通常想定されていない。
特開２００４−２４６８６５号公報 ”音声認識エンジン”、［online］、日本アイ・ビー・エム株式会社、［平成１７年１２月１２日検索］、インターネット（URL:http://www-06.ibm.com/jp/voiceland/technology/p03.html） In order to provide such an interface part in a program, it is usually necessary to modify and recompile the source program. In other words, it is necessary to develop a new program incorporating an interface with a speech recognition engine. Therefore, adding a voice recognition function to a program that has already been distributed to end users without upgrading is not normally assumed in the current technology.
JP 2004-246865 A "Speech recognition engine", [online], IBM Japan, Ltd., [Search on December 12, 2005], Internet (URL: http://www-06.ibm.com/jp/voiceland/ technology / p03.html)

本発明は、このような課題を解決するためになされた。本発明の目的は、ユーザインターフェースを構成するプログラムに、そのプログラムを改変することなく、音声認識機能を追加することができる技術を提供することにある。 The present invention has been made to solve such problems. An object of the present invention is to provide a technique capable of adding a voice recognition function to a program constituting a user interface without modifying the program.

本発明は前記課題を解決するために、以下の手段を採用した。すなわち、本発明は、ユーザインターフェース部品を含み、コンピュータ画面上に構成される第１の画面部分の表示態様を定義する画面表示定義情報から前記ユーザインターフェース部品を定義する部品情報を検索する手段と、発話によって特定される対象となる、前記ユーザインターフェース部品に対応する文字列情報の入力を受け付ける手段と、前記部品情報と関連付けて前記文字列情報を発話部品テーブルに記憶する手段と、を備える情報処理装置である。 The present invention employs the following means in order to solve the above problems. That is, the present invention includes means for retrieving component information that defines the user interface component from screen display definition information that defines a display mode of the first screen portion configured on the computer screen, including user interface components; Information processing comprising: means for accepting input of character string information corresponding to the user interface component to be identified by utterance; and means for storing the character string information in the utterance component table in association with the component information Device.

また、本発明は、部品情報によって定義されるユーザインターフェース部品を含み、コンピュータ画面上に構成される第１の画面部分の表示態様を定義する画面表示定義情報を検索する手段と、前記ユーザインターフェース部品を含む第２の画面部分を表示する手段と、発話によって特定される文字列情報と前記部品情報とを対応付ける発話部品テーブルを記憶する手段と、発話を受け付けて生成された文字列情報を取得する手段と、前記生成された文字列情報に対応するユーザインターフェース部品を特定する手段と、前記ユーザインターフェース部品に応じた処理を実行する処理手段と、を備える情報処理装置であってもよい。 The present invention also includes means for retrieving screen display definition information that includes a user interface component defined by component information and defines a display mode of a first screen portion configured on a computer screen, and the user interface component Means for displaying a second screen portion including the text, means for storing a speech component table that associates the character string information specified by speech and the component information, and obtaining character string information generated by receiving the speech An information processing apparatus comprising: means; means for specifying a user interface component corresponding to the generated character string information; and processing means for executing processing according to the user interface component.

本発明によれば、第１の画面部分のユーザインターフェース部品を定義する部品情報に対して発話によって特定される文字列情報を対応付けることにより、発話によって前記第２のユーザインターフェース部品を通じて前記ユーザインターフェース部品に応じた処理を実行することができる。 According to the present invention, by associating the character string information specified by the utterance with the component information defining the user interface component of the first screen portion, the user interface component through the second user interface component by the utterance. It is possible to execute processing according to the above.

本発明は、コンピュータが上記いずれかの処理を実行する方法であってもよい。また、本発明は、上記いずれかの手段としてコンピュータを機能させるコンピュータ実行可能なプログラムであってもよい。また、本発明は、そのようなコンピュータ実行可能なプログラムをコンピュータが読み取り可能な記録媒体に記録したものであってもよい。 The present invention may be a method in which a computer executes any one of the processes described above. Further, the present invention may be a computer-executable program that causes a computer to function as any of the above-described means. Further, the present invention may be a computer-readable recording medium recorded with such a computer-executable program.

本発明によれば、ユーザインターフェースを構成するプログラムに、そのプログラムを改変することなく、音声認識機能を追加することができる。 According to the present invention, a voice recognition function can be added to a program constituting a user interface without modifying the program.

以下、図面を参照して本発明を実施するための最良の形態（以下、実施形態という）に係る情報システムについて説明する。以下の実施形態の構成は例示であり、本発明は実施形態の構成には限定されない。 Hereinafter, an information system according to the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings. The configuration of the following embodiment is an exemplification, and the present invention is not limited to the configuration of the embodiment.

＜情報システムの概要＞
図１に、本実施形態に係る情報システムの構成図を示す。この情報システムは、ウェブページ（ウェブサイトともいう）においてウェブアプリケーションプログラムによるサービスをネットワーク上のユーザコンピュータ（以下、クライアント２という）に提供するサーバ１と、サーバ１からサービスの提供を受けるクライアント２を含んでいる。 <Outline of information system>
FIG. 1 shows a configuration diagram of an information system according to the present embodiment. This information system includes a server 1 that provides a service by a web application program on a web page (also referred to as a website) to a user computer (hereinafter referred to as a client 2) on a network, and a client 2 that receives the service from the server 1. Contains.

ここで、ネットワークは、インターネットのような公衆ネットワークであってもいいし、ＬＡＮ（Local Area Network ）、専用線、またはＶＰＮ（Virtual Private Network
）等によって構成されたプライベートなネットワークであってもよい。 Here, the network may be a public network such as the Internet, a LAN (Local Area Network), a dedicated line, or a VPN (Virtual Private Network).
) Or the like.

サーバ１は、ウェブサーバプログラムを実行し、クライアント２にウェブページを通じて利用可能なアプリケーションプログラムによるサービスを提供する。サーバ１は、例えば、ＨＴＭＬ（HyperText Markup Language）、またはＸＭＬ（eXtensible Markup Language ）等で記述された情報をクライアント２に送信する。また、サーバ１は、は、ＪＳＰ（Java（登録商標) Server Pages）、あるいは、IIS(Internet Information Service)に
基づいて記述された情報から、連携するプログラムを実行する。そして、サーバ１は、ウェブページを動的に生成してクライアント２に送信する。 The server 1 executes a web server program and provides the client 2 with a service by an application program that can be used through a web page. The server 1 transmits information described in, for example, HTML (HyperText Markup Language) or XML (eXtensible Markup Language) to the client 2. In addition, the server 1 executes a linked program from information described based on JSP (Java (registered trademark) Server Pages) or IIS (Internet Information Service). Then, the server 1 dynamically generates a web page and transmits it to the client 2.

クライアント２は、例えば、アプリケーションプログラムとしてブラウザプログラムを実行する。そして、クライアント２は、サーバ１にＨＴＭＬ、ＸＭＬ等による情報の提供を要求するとともに、提供された情報を表示装置の画面上に表示する。これにより、サーバ１のウェブページがクライアント２の表示装置に表示され、サーバ１またはサーバ１と
連携する他のコンピュータで実行されるアプリケーションプログラムのユーザインターフェースがクライアント２上で利用可能となる。 For example, the client 2 executes a browser program as an application program. The client 2 requests the server 1 to provide information by HTML, XML, etc., and displays the provided information on the screen of the display device. Accordingly, the web page of the server 1 is displayed on the display device of the client 2, and the user interface of the application program executed on the server 1 or another computer linked with the server 1 can be used on the client 2.

サーバ１およびクライアント２は、いずれも、ＣＰＵ、メモリ、入出力インターフェース、表示装置、ハードディスク、ネットワークとの通信インターフェース、ユーザの発話内容収集するマイクロホン、音声を出力するスピーカ、着脱可能な可搬媒体の駆動装置等を有している。サーバ１およびクライアント２は、それぞれのコンピュータプログラムを実行することにより、サーバ１およびクライアント２としての機能を実現している。いずれにしても、サーバ１およびクライアント２の構成要素および作用は広く知られているので、その説明は省略する。 Each of the server 1 and the client 2 includes a CPU, a memory, an input / output interface, a display device, a hard disk, a communication interface with a network, a microphone that collects user's utterance contents, a speaker that outputs sound, and a removable portable medium. It has a driving device and the like. The server 1 and the client 2 realize functions as the server 1 and the client 2 by executing respective computer programs. In any case, since the components and operations of the server 1 and the client 2 are widely known, the description thereof will be omitted.

本情報システムの特徴は、クライアント２に表示されたウェブページ（およびサーバ１で実行されるアプリケーションプログラムのユーザインターフェース）にユーザがアクセスするときに、ユーザの発話によるアクセスを可能とする点にある。 A feature of this information system is that when a user accesses a web page displayed on the client 2 (and a user interface of an application program executed on the server 1), the user can access the web page.

すなわち、ユーザの発話内容をクライアント２が認識し、その発話内容に対応する文字列を生成する。そして、クライアント２は、その文字列をウェブページとして表示されたユーザインターフェースに設定する。例えば、クライアント２は、その文字列に対応するウェブページを表示し、あるいは、その文字列に対応するウェブページを表示するウィンドウをフォーカス（マウス等のポインタで選択された状態に）する。 That is, the client 2 recognizes the user's utterance content and generates a character string corresponding to the utterance content. Then, the client 2 sets the character string in the user interface displayed as a web page. For example, the client 2 displays a web page corresponding to the character string, or focuses a window displaying the web page corresponding to the character string (in a state selected by a pointer such as a mouse).

また、クライアント２は、その文字列に対応するラベルの付された画面上の構成要素、例えば、テキスト入力フィールドに文字列を設定する。また、クライアント２は、その文字列に対応する選択肢をプルダウンメニュのリストから選択する。また、クライアント２は、その文字列に対応するラベルの付されたボタンを押下する。このようにして、本情報システムでは、ユーザは、音声を通じてウェブ上のグラフィカルユーザインターフェース部品を操作することが可能である。 Further, the client 2 sets a character string in a component on the screen, for example, a text input field, labeled with the character string. Further, the client 2 selects an option corresponding to the character string from the pull-down menu list. In addition, the client 2 presses a button with a label corresponding to the character string. Thus, in this information system, the user can operate the graphical user interface component on the web through voice.

サーバ１は、そのような発話によるユーザインターフェース構築を支援する。サーバ１には、クライアント２上での発話内容をウェブページ上のユーザインターフェース部品に関係付けるためのプログラム（以下、発話定義ツールという）を有している。 The server 1 supports user interface construction based on such utterances. The server 1 has a program (hereinafter referred to as an utterance definition tool) for relating utterance contents on the client 2 to user interface parts on a web page.

発話定義ツールは、サーバ１を管理するユーザによって指定されたウェブページを解析し、そのウェブページに配置されたユーザインターフェース部品をピックアップする。ここで、サーバ１を管理するユーザとは、ウェブページをクライアント２に配信し、クライアント２にアプリケーションプログラムを利用させるユーザ、例えば、アプリケーションサービスプロバイダである。 The utterance definition tool analyzes a web page specified by a user who manages the server 1 and picks up user interface components arranged on the web page. Here, the user who manages the server 1 is a user who distributes a web page to the client 2 and causes the client 2 to use an application program, for example, an application service provider.

発話定義ツールは、指定されたウェブページを記述する定義ファイル、例えば、ＨＴＭＬ、ＸＭＬ、ＪＳＰ、ＩＩＳ等のファイルを解析し、ウェブページに含まれるユーザインターフェース部品の構成を抽出する。そして、発話定義ツールは、抽出された個々のユーザインターフェース部品を選択するための音声情報を受け付ける。 The utterance definition tool analyzes a definition file describing a specified web page, for example, a file such as HTML, XML, JSP, IIS, etc., and extracts the configuration of user interface components included in the web page. Then, the utterance definition tool receives audio information for selecting each extracted user interface component.

例えば、発話ツールは、ユーザに特定のユーザインターフェース部品、「商品選択」というラベルの付されたプルダウンメニュを選択させる（選択を受け付ける）。そして、その状態で、ユーザがそのプルダウンメニュをフォーカスしたいときに発する言葉を発話する。 For example, the speech tool causes the user to select a specific user interface component, a pull-down menu labeled “product selection” (accept selection). In this state, the user speaks a word that is spoken when the user wants to focus on the pull-down menu.

すると、発話された音声情報は、例えば、マイクロホンにより検知され、音声文字変換ツール（いわゆる音声認識プログラム）を通じて文字列（例えば、「しょうひんせんたく
」に変換される。そして、変換された文字列「しょうひんせんたく」が発話定義ツールに引き渡される。 Then, the spoken voice information is detected by, for example, a microphone, and is converted into a character string (for example, “Shohinsen Taku”) through a voice character conversion tool (so-called voice recognition program). “Shohinsen Taku” is delivered to the utterance definition tool.

発話定義ツールは、上記ユーザインターフェースを識別する情報（例えば、ＨＴＭＬのファイルの所在を示すＵＲＬ、そのＨＴＭＬファイル中で上記プルダウンメニュを表示させるタグ情報）と、そのユーザインターフェース上の文字列（例えば、「商品選択」）と、発話内容から変換された文字列（例えば、「しょうひんせんたく」）とを関係付けてデータベースに格納する。 The utterance definition tool includes information for identifying the user interface (for example, a URL indicating the location of an HTML file, tag information for displaying the pull-down menu in the HTML file), and a character string on the user interface (for example, “Product selection”) and a character string converted from the utterance content (for example, “Shohinsen Taku”) are related and stored in the database.

データベースには、発話内容とユーザインターフェース部品とを関係付ける発話定義情報、発話定義情報を構築するための各種管理情報（各種マスタという）が格納されている。 The database stores utterance definition information relating utterance contents and user interface components, and various management information (referred to as various masters) for constructing utterance definition information.

クライアント２には、事前に、サーバ１から上記データベースが提供されている。また、クライアント２には、音声文字変換ツールの他、エンジンと呼ばれるプログラムがインストールされている。 The client 2 is provided with the database from the server 1 in advance. In addition to the phonetic character conversion tool, the client 2 is installed with a program called an engine.

エンジンは、音声文字変換ツールにから引き渡される文字列に基づいて、データベースを検索し、その文字列に関係付けられるユーザインターフェース部品を特定し、そのユーザインターフェース部品に応じた処理を実行する。例えば、エンジンは、文字列に対応するユーザインターフェース部品を含むブラウザのウィンドウを表示する。 The engine searches the database based on the character string delivered from the speech to character conversion tool, identifies the user interface component related to the character string, and executes processing corresponding to the user interface component. For example, the engine displays a browser window including a user interface component corresponding to the character string.

また、エンジンは、そのウィンドウと他のウィンドウとの表示上の階層関係の変更する。例えば、エンジンは、そのユーザインターフェース部品を含むブラウザのウィンドウを最上位に表示する。また、エンジンは、そのユーザインターフェース部品の選択（ポインタの位置づけ）、または、そのユーザインターフェース部品への文字列の設定を実行する。 The engine also changes the hierarchical relationship on the display between the window and other windows. For example, the engine displays a browser window including the user interface component at the top. Further, the engine selects the user interface component (positions the pointer) or sets a character string in the user interface component.

このようにして、本情報システムでは、サーバ１上の発話定義ツールによって発話定義情報が構築され、サーバ１からクライアント２に提供される。クライアント２では、音声文字変換ツールが発話内容を文字列に変換する。また、音声文字変換ツールと連携するエンジンが変換された文字列から対応させるべきユーザインターフェース部品を特定し、上記そのユーザインターフェース部品に応じた処理を実行する。 Thus, in this information system, the utterance definition information is constructed by the utterance definition tool on the server 1 and provided from the server 1 to the client 2. In the client 2, the speech character conversion tool converts the utterance content into a character string. In addition, the user interface component to be associated is specified from the converted character string by the engine that cooperates with the phonetic character conversion tool, and processing corresponding to the user interface component is executed.

このようにして本情報システムでは、サーバ１のウェブページとして提供されるユーザインターフェースに、サーバ１のプログラムを改変することなく、発話によるユーザインターフェースを追加することができる。 Thus, in this information system, a user interface based on utterance can be added to a user interface provided as a web page of the server 1 without modifying the program of the server 1.

図２に、サーバ１上の処理を示すフローチャートを示す。このフローチャートは、左右および中央からなる３つの縦長領域に分割され、各領域がサーバ１で実行されるプログラムを示している。したがって、図２のフローチャートにより、プログラム間の連携関係も示されている。図２で、左側領域がユーザインターフェースを形成するアプリケーションを示す。また、中央の領域が発話定義ツールを示す。また、右側の領域が音声文字変換部を示す。なお、音声文字変換部は、例えば、商用の音声認識プログラムと、音声認識の結果得られる発話定義情報とを含む。 FIG. 2 shows a flowchart showing processing on the server 1. This flowchart shows a program that is divided into three vertically long areas consisting of right and left and center, and each area is executed by the server 1. Therefore, the linkage relationship between programs is also shown in the flowchart of FIG. In FIG. 2, the left area shows the application forming the user interface. The central area shows the utterance definition tool. Moreover, the area | region of the right side shows a phonetic character conversion part. Note that the speech character conversion unit includes, for example, a commercial speech recognition program and utterance definition information obtained as a result of speech recognition.

なお、本実施形態では、音声認識プログラムについて制限はなく、一般のアプリケーションプログラムとのインターフェースがあるものであれば、どのようなプログラムを使用してよい。 In the present embodiment, the voice recognition program is not limited, and any program may be used as long as it has an interface with a general application program.

この処理では、まず、サーバ１上で発話定義ツールが起動される（Ｓ１）。すると、サーバ１の表示装置に発話定義ツールの操作画面が表示される。また、サーバ１上では、すでに、音声認識機能を付加したいアプリケーションプログラムのユーザインターフェース（例えば、ブラウザ画面上でウェブアプリケーションのユーザインターフェースの画面）が起動されていると仮定する。 In this process, first, an utterance definition tool is activated on the server 1 (S1). Then, the operation screen of the utterance definition tool is displayed on the display device of the server 1. Further, it is assumed that the user interface (for example, the screen of the web application user interface on the browser screen) of the application program to which the voice recognition function is to be added has already been activated on the server 1.

ユーザは、そのユーザインターフェースに相当する画面を発話定義ツールの操作画面にドラッグアンドドロップする（Ｓ２）。すると、発話定義ツールは、そのユーザインターフェースを定義する定義情報（本発明の画面表示定義情報に相当する）、例えば、ＨＴＭＬファイル、ＸＭＬファイル等の存在場所を示す識別情報（ＵＲＬ、Uniform Resource Locator 等）を取得する。この定義情報は、周知のように、サーバ１のハードディスク、
あるいは、サーバ１とネットワークを通じて接続される他のコンピュータの記憶装置に格納されている。 The user drags and drops a screen corresponding to the user interface onto the operation screen of the utterance definition tool (S2). Then, the utterance definition tool defines definition information (corresponding to the screen display definition information of the present invention) that defines the user interface, for example, identification information (URL, Uniform Resource Locator, etc.) indicating the location of the HTML file, XML file, etc. ) To get. As is well known, this definition information includes the hard disk of the server 1,
Alternatively, it is stored in a storage device of another computer connected to the server 1 through a network.

そして、そのユーザインターフェースを定義する情報を解析し（この処理を実行するサーバ１のＣＰＵが部品情報を検索する手段に相当）、ユーザインターフェースを構成する部品（本発明のユーザインターフェース部品に相当）、例えば、ラベル、入力フィールド、プルダウンメニュのリスト、押しボタンのラベル等の定義情報（本発明の部品情報に相当）を取得する。そして、発話定義ツールは、そのユーザインターフェースを示す画面を生成する（Ｓ３）。 Then, the information defining the user interface is analyzed (corresponding to means for retrieving the component information by the CPU of the server 1 that executes this process), and the components constituting the user interface (corresponding to the user interface components of the present invention) For example, definition information (equivalent to the component information of the present invention) such as a label, an input field, a pull-down menu list, and a push button label is acquired. Then, the utterance definition tool generates a screen showing the user interface (S3).

次に、発話定義ツールは、ユーザの操作に応じてユーザインターフェース上の各フィールド、あるいは、各ユーザインターフェース部品にそれぞれ発話による読みを設定していく（Ｓ４）。すなわち、定義ツールは、ユーザの入力装置（キーボード、マウス等ポインティングデバイス）を通じた操作に応じて、読みを設定する対象のユーザインターフェース部品を選択する（フォーカスする）。そして、その状態で、マイクロホンを通じて入力された音声が、音声文字変換ツールによって文字列に変換される。発話定義ツールは、音声文字変換ツールのアプリケーションインターフェースを通じて変換された文字列を取得する（この処理を実行するサーバ１のＣＰＵが、本発明のユーザインターフェース部品に対応する文字列情報の入力を受け付ける手段に相当する）。ただし、マイクロホンを通じて入力された音声入力する代わりに、キーボードあるいはポインティングデバイス等により、発話文字列を手入力するようにしても構わない。 Next, the utterance definition tool sets utterance readings for each field on the user interface or each user interface component according to the user's operation (S4). That is, the definition tool selects (focuses) a user interface component to be set for reading in accordance with an operation through a user input device (a pointing device such as a keyboard and a mouse). In this state, the voice input through the microphone is converted into a character string by the voice character conversion tool. The utterance definition tool acquires the character string converted through the application interface of the phonetic character conversion tool (means that the CPU of the server 1 executing this process receives input of character string information corresponding to the user interface component of the present invention) Equivalent to However, instead of inputting voice input through a microphone, an utterance character string may be manually input using a keyboard or a pointing device.

さらに、必要に応じて、そのユーザインターフェース部品に対する属性を設定する。そして、そのユーザインターフェース部品を識別する情報（例えば、ＨＴＭＬファイルのタグ）と音声に基づく文字列と属性情報等が組になって音声文字変換部のデータベースに格納される（Ｓ５）。図２では、このデータベースを辞書ファイルおよびプロファイルとして示している。 Furthermore, attributes for the user interface parts are set as necessary. Then, information identifying the user interface component (for example, an HTML file tag), a character string based on speech, attribute information, and the like are paired and stored in the database of the speech character conversion unit (S5). In FIG. 2, this database is shown as a dictionary file and a profile.

次に、発話定義ツールは、対象アプリケーションを動作させる固有の情報を作成し、データベースに記憶する（Ｓ６）。すなわち、Ｓ４の処理にて変換された文字列およびＳ５にて設定された属性等が、選択中のユーザインターフェース部品と関係付けてデータベースに記憶される（データベースが本発明の発話部品テーブルに記憶する手段に相当する）。 Next, the utterance definition tool creates unique information for operating the target application and stores it in the database (S6). That is, the character string converted in the process of S4, the attribute set in S5, and the like are stored in the database in association with the selected user interface component (the database is stored in the speech component table of the present invention). Equivalent to the means).

図３は、図２の設定によって音声認識機能が付加されたユーザインターフェースを利用するクライアント２側の処理を示すフローチャートである。このフローチャートは、左右および中央からなる３つの縦長領域に分割され、各領域がクライアント２で実行されるプログラムを示している。したがって、図３のフローチャートにより、プログラム間の連携関係も示されている。図３で、左側領域がユーザインターフェースを形成するアプリケー
ションを示す。また、中央の領域が音声認識機能を制御するエンジンを示す。また、右側の領域が音声文字変換部を示す。 FIG. 3 is a flowchart showing processing on the client 2 side using the user interface to which the voice recognition function is added according to the setting of FIG. This flowchart shows a program that is divided into three vertically long areas including right and left and the center, and each area is executed by the client 2. Therefore, the linkage relationship between programs is also shown in the flowchart of FIG. In FIG. 3, the left area shows the application forming the user interface. The engine in the center area controls the voice recognition function. Moreover, the area | region of the right side shows a phonetic character conversion part.

予め、クライアント２には、図２のＳ４−Ｓ６の処理で設定されたデータベースの情報（辞書ファイルおよびプロファイル、本発明の発話部品テーブルに相当）がダウンロードされている（この処理を実行するクライアント２のＣＰＵが他の情報処理装置から前記発話部品テーブルの情報の提供を受ける手段に相当する。また、データベースの情報を提供するサーバ１のＣＰＵが、本発明の発話部品テーブルの情報を提供する手段に相当する）。なお、データベースの情報は、クライアント２がサーバ１にアクセスするたびにサーバ１からダウンロードするようにしてもよい。また、クライアント２がサーバ１にアクセスしたときに、データベースの情報がクライアント２にないことが検知されたときにサーバ１からダウンロードするようにしてもよい。また、クライアント２がサーバ１にアクセスしたときに、データベースの情報が更新されていることが検知されたときにサーバ１からダウンロードするようにしてもよい。このデータベースを記憶するクライアント２のハードディスクが本発明の発話部品テーブルを記憶する手段に相当する。 The database information (dictionary file and profile, corresponding to the speech component table of the present invention) set in the processing of S4-S6 in FIG. 2 is downloaded to the client 2 in advance (client 2 that executes this processing). The CPU corresponds to means for receiving information on the utterance component table from another information processing apparatus, and the CPU of the server 1 that provides information on the database provides information on the utterance component table of the present invention. Equivalent to The database information may be downloaded from the server 1 every time the client 2 accesses the server 1. Further, when the client 2 accesses the server 1, it may be downloaded from the server 1 when it is detected that the database information does not exist in the client 2. Further, when the client 2 accesses the server 1, it may be downloaded from the server 1 when it is detected that the database information is updated. The hard disk of the client 2 that stores this database corresponds to means for storing the speech component table of the present invention.

また、クライアント２には、一般的なブラウザおよび音声文字変換ツールがインストールされている。さらに、音声文字変換ツールの音声認識結果である文字列を受け取り、データベース（辞書ファイルおよびプロファイル）を検索するエンジンがインストールされる。サーバ１が、クライアント２にエンジンがインストールされていないことを検知したときに、エンジンとデータベースの情報とダウンロードするようにしてもよい。このような処理を実行するサーバ１のＣＰＵが、本発明のコンピュータプログラムを配布する手段に相当する。 The client 2 is installed with a general browser and a voice character conversion tool. Furthermore, an engine that receives a character string that is a speech recognition result of the speech-to-speech conversion tool and searches a database (dictionary file and profile) is installed. When the server 1 detects that the engine is not installed in the client 2, information about the engine and the database may be downloaded. The CPU of the server 1 that executes such processing corresponds to means for distributing the computer program of the present invention.

このようなインストールが完了した状態で、まず、エンジンが起動される（Ｓ１１）。エンジンが起動された状態で、アプリケーションプログラムのユーザインターフェース（本発明の第１の画面部分に相当）が起動されると（Ｓ１２Ａ）、エンジンは、そのアプリケーションプログラムのユーザインターフェースを定義する定義情報（本発明の画面表示定義情報に相当）、例えば、ＨＴＭＬファイル、ＸＭＬファイル等の存在場所を示す識別情報（ＵＲＬ等）を取得する。例えば、エンジンは、ブラウザが表示先のＵＲＬを切り替えるごとにそのＵＲＬを検知する。そして、エンジンは、そのＵＲＬがデータベース（辞書ファイルおよびプロファイル）に設定されたＵＲＬと一致するか否かを判定する。 With such installation completed, the engine is first started (S11). When the user interface of the application program (corresponding to the first screen portion of the present invention) is started in the state where the engine is started (S12A), the engine defines definition information (this book) that defines the user interface of the application program. For example, identification information (URL or the like) indicating the location of an HTML file, XML file, or the like is acquired. For example, the engine detects the URL every time the browser switches the display destination URL. Then, the engine determines whether or not the URL matches the URL set in the database (dictionary file and profile).

そして、識別情報がデータベースに設定されている場合、エンジンはその識別情報で定義されるユーザインターフェースが音声認識の対象であると判断する。その場合には、エンジンは、その識別情報で示される格納先からユーザインターフェース部品を定義する定義情報（本発明の部品情報に相当）を読み出し、本来のユーザインターフェースに重畳する疑似画面（本発明の第２の画面部分に相当）を生成する（Ｓ１３）。 If the identification information is set in the database, the engine determines that the user interface defined by the identification information is the target of speech recognition. In that case, the engine reads the definition information (corresponding to the component information of the present invention) that defines the user interface component from the storage location indicated by the identification information, and superimposes it on the original user interface (the present invention). (Corresponding to the second screen portion) is generated (S13).

したがって、この状態では、ユーザが実行中のアプリケーションのユーザインターフェースにオーバーラップして、疑似画面が表示装置に表示されている（この表示を制御するクライアント２が本発明の第２の画面部分を表示する手段に相当する）。ただし、ユーザから見ると、本来のアプリケーションプログラムのユーザインターフェースが表示されているように見える。本実施形態の情報システムでは、この段階までをクライアント２側の準備作業と呼ぶ。 Therefore, in this state, the pseudo screen is displayed on the display device so as to overlap the user interface of the application being executed by the user (the client 2 that controls this display displays the second screen portion of the present invention). Equivalent to the means to do). However, when viewed from the user, the user interface of the original application program appears to be displayed. In the information system of the present embodiment, the steps up to this stage are referred to as preparation work on the client 2 side.

このように準備作業が終了した状態で、ユーザが音声入力する。例えば、ユーザがマイクロホンに向かって発話する（Ｓ１４）。すると、音声文字変換ツールが音声をＡＳＣＩＩコードに変換する（Ｓ１５）。さらに、音声文字変換ツールがＡＳＣＩＩコードから文字列（テキスト）を生成する。 Thus, the user performs voice input in a state where the preparation work is completed. For example, the user speaks into the microphone (S14). Then, the voice character conversion tool converts the voice into an ASCII code (S15). Further, the voice character conversion tool generates a character string (text) from the ASCII code.

そして、音声認識された文字列がエンジンに引き渡される（Ｓ１７、この処理を実行するクライアント２のＣＰＵが発話を受け付けて生成された文字列情報を取得する手段に相当する）。エンジンは、音声認識ツールから引き渡された文字列を基にデータベース（辞書ファイルおよびプロファイル）を検索する（この処理を実行するクライアント２のＣＰＵが生成された文字列情報に対応するユーザインターフェース部品を特定する手段に相当する）。 Then, the voice-recognized character string is delivered to the engine (S17, which corresponds to a means for acquiring character string information generated by the CPU of the client 2 executing this process by receiving an utterance). The engine searches the database (dictionary file and profile) based on the character string delivered from the voice recognition tool (the CPU of the client 2 that executes this process identifies the user interface component corresponding to the generated character string information) Equivalent to the means to do).

その文字列に対応づけたユーザインターフェース部品がデータベースに定義されていた場合、エンジンはそのユーザインターフェース部品に応じた処理を実行する（Ｓ１９、この処理を実行するクライアント２のＣＰＵがユーザインターフェース部品に応じた処理を実行する処理手段に相当する）。 When the user interface component associated with the character string is defined in the database, the engine executes a process corresponding to the user interface component (S19, the CPU of the client 2 executing this process responds to the user interface component). Corresponds to a processing means for executing the processing).

例えば、エンジンは、そのユーザインターフェース部品が画面の一部を構成するウィンドウである場合には、そのウィンドウを表示する。また、エンジンは、そのウィンドウを複数ウィンドウからなる階層のうちの最上位の階層に表示する。また、そのユーザインターフェース部品がテキスト入力フィールドである場合には、エンジンは、その入力フィールドに文字列を設定する。また、そのユーザインターフェース部品がプルダウンメニュのタイトルである場合、エンジンは、そのプルダウンメニュのリスト（選択肢）を表示する。また、そのユーザインターフェース部品がプルダウンメニュのリストに含まれる要素（選択肢）の１つである場合、エンジンは、その選択肢を選択する。また、そのユーザインターフェース部品が押しボタンのラベルである場合、エンジンは、その押しボタンを押下する。このようにして、エンジンは、Ｓ１４からＳ１９までの処理が繰り返すように制御する。 For example, when the user interface component is a window constituting a part of the screen, the engine displays the window. Further, the engine displays the window in the highest hierarchy among the hierarchy composed of a plurality of windows. When the user interface component is a text input field, the engine sets a character string in the input field. When the user interface component is a pull-down menu title, the engine displays a list (option) of the pull-down menu. When the user interface component is one of the elements (options) included in the pull-down menu list, the engine selects the option. If the user interface component is a push button label, the engine presses the push button. In this way, the engine controls to repeat the processes from S14 to S19.

＜データ構造＞
以下、本実施形態の情報システムが使用するデータベース（辞書ファイルおよびプロファイル）のデータ構造を説明する。本実施形態では、データベースは、複数のテーブルから構成され、例えば、ハードディスク等の記憶装置に記憶されている。 <Data structure>
Hereinafter, the data structure of the database (dictionary file and profile) used by the information system of this embodiment will be described. In the present embodiment, the database is composed of a plurality of tables, and is stored in a storage device such as a hard disk, for example.

図４は、ＵＲＬマスタと呼ばれるテーブルの構成を示す図である。ＵＲＬマスタは、ブラウザに表示されるウェブページを定義する定義情報の格納先を記録する。すなわち、ＵＲＬマスタは、エンジンの処理対象であるユーザインターフェースを示す情報を格納している。 FIG. 4 is a diagram showing a configuration of a table called a URL master. The URL master records the storage location of the definition information that defines the web page displayed on the browser. That is, the URL master stores information indicating the user interface that is the processing target of the engine.

図４のように、ＵＲＬマスタは、テーブルの各行を識別する情報のフィールド（Ｌ＿ＩＤ）、ＵＲＬを格納するフィールド（Ｓ＿ＵＲＬ）、そのウェブページのタイトルを格納するフィールド（Ｓ＿ＴＩＴＬＥ）、そのＵＲＬをデータベースに登録した日付（Ｄ＿ＲＥＧＩＳＴＥＲ）、そのＵＲＬの情報を更新した日付（Ｄ＿ＵＰＤＡＴＥ）等を有している。 As shown in FIG. 4, the URL master stores a field (L_ID) of information for identifying each row of the table, a field (S_URL) for storing the URL, a field (S_TITLE) for storing the title of the web page, and the URL in the database. The registered date (D_REGISTER), the date of updating the URL information (D_UPDATE), and the like.

図５は、フィールドマスタの構成を示す図である。フィールドマスタは、各ウェブページ上のユーザインターフェース部品を定義する。ＵＲＬマスタは、サーバ１において、ユーザインターフェース部品の定義情報が解析された結果生成されるテーブルである。 FIG. 5 is a diagram showing the configuration of the field master. The field master defines user interface components on each web page. The URL master is a table generated as a result of analyzing user interface component definition information in the server 1.

図５のように、フィールドマスタの各行の先頭には、ＵＲＬマスタのＬ＿ＩＤが指定されている。したがって、フィールドマスタの各行は、ＵＲＬマスタのいずれかの行と関連づけされる。 As shown in FIG. 5, the L_ID of the URL master is specified at the top of each line of the field master. Thus, each row in the field master is associated with any row in the URL master.

また、フィールドマスタは、カーソル移動語（そのユーザインターフェース部品のフィ
ールド名、Ｓ＿ＴＩＴＬＥ）、データ型（Ｓ＿ＦＩＥＬＤ＿ＴＹＰＥ、Ｓ＿ＴＡＧ＿ＴＹＰＥ）、属性（Ｓ＿ＦＩＥＬＤ＿ＩＮＦＯ）、そのフィールドに設定すべき値が価格であった場合の商品単価やフィールドの書式（Ｓ＿ＵＮＩＴ、Ｓ＿ＦＯＲＭＡＴ）、そのフィールドから抽出された値（Ｓ＿ＤＥＦＡＵＬＴＦＯＲＭおよびＳ＿ＷＲＩＴＥＦＯＲＭ）等を含んでいる。 In addition, the field master is a unit price when the cursor movement word (field name of the user interface part, S_TITLE), data type (S_FIELD_TYPE, S_TAG_TYPE), attribute (S_FIELD_INFO), and the value to be set in the field is a price. And field format (S_UNIT, S_FORMAT), values extracted from the field (S_DEFAULTFORM, S_WRITEFORM), and the like.

このうち、Ｓ＿ＤＥＦＡＵＬＴＦＯＲＭは表記用文字列である。音声文字変換ツールとのインターフェース部分において、ユーザインターフェース部品の文字列に半角・全角スペースがあると認識不可となってしまう場合がある。そこで、表記用文字列から半角・全角スペースを削除したものがＳ＿ＷＲＩＴＴＥＮＦＯＲＭである
なお、フィールドから抽出された値（Ｓ＿ＤＥＦＡＵＬＴＦＯＲＭおよびＳ＿ＷＲＩＴＥＦＯＲＭ）は、ユーザインターフェース部品がテキスト入力フィールドである場合には、入力する文字列が固定である場合を除いて空欄であり、ユーザインターフェース部品がプルダウンメニュの選択肢である場合には、その要素の値であり、ユーザインターフェース部品が押しボタンである場合には、そのラベルであり、ユーザインターフェース部品がウィンドウやプルダウンメニュのタイトルである場合には、そのタイトル文字列である。このフィールドから抽出された値（Ｓ＿ＤＥＦＡＵＬＴＦＯＲＭおよびＳ＿ＷＲＩＴＥＦＯＲＭ）は、ユーザインターフェース部品に対応付けられる値と呼ぶ。 Of these, S_DEFAULTFORM is a character string for notation. In the interface portion with the voice character conversion tool, if there is a single-byte / double-byte space in the character string of the user interface component, recognition may not be possible. Therefore, S_WRITETENFORM is obtained by deleting half-width and full-width spaces from the notation character string. Note that values extracted from the field (S_DEFAULTFORM and S_WRITEFORM) are input characters when the user interface component is a text input field. Blank unless the column is fixed, the value of the element if the user interface component is a pull-down menu option, and the label if the user interface component is a push button. When the user interface component is a title of a window or pull-down menu, the title character string is used. Values extracted from this field (S_DEFAULTFORM and S_WRITEFORM) are called values associated with user interface components.

図６は、発話マスタの構成を示す図である。発話マスタは、ユーザインターフェース部品それぞれに対応付けられる値（Ｓ＿ＷＲＩＴＥＦＯＲＭ）に対応する読み（Ｓ＿ＳＰＯＫＥＮＦＯＲＭ）を定義する。例えば、「コーヒー」に対して「こーひー」が対応付けられ、「商品選択」に対して「しょうひんせんたく」が対応付けられる。 FIG. 6 is a diagram showing the configuration of the utterance master. The utterance master defines a reading (S_SPOKENFORM) corresponding to a value (S_WRITEFORM) associated with each user interface component. For example, “Coffee” is associated with “Coffee”, and “Shohinsen Taku” is associated with “Product selection”.

なお、ユーザインターフェース部品それぞれに対応付けられる値の１つに対して、複数の読みを設定してよい。例えば、「四菱プラズマテレビ５０インチＹＰＴ−５０」という値に対して、「ごじゅういんちぷらずま」、「ごじゅういんちぷらずまてれび」、「よんびしぷらずまてれびごじゅういんち」等が設定される。例えば、本実施形態の情報システムがインターネットショッピングのユーザインターフェースに対して、音声認識機能を追加する場合、商品名である「四菱プラズマテレビ５０インチＹＰＴ−５０」に対して、サービスを利用するエンドユーザは、様々な読みを発話することが想定される。発話マスタには、値（Ｓ＿ＷＲＩＴＥＦＯＲＭ）に対して想定される読みを数多く設定しておけばよい。 A plurality of readings may be set for one of the values associated with each user interface component. For example, for the value of “Shiryo Plasma TV 50 inch YPT-50”, “Gojyu Chun Pizuma”, “Gyo Ji Chun Pizuma Telebi”, “Yonbushi Pizuma Temare Gyujo” Inch "etc. are set. For example, when the information system of this embodiment adds a voice recognition function to a user interface for Internet shopping, an end of using a service for the product name “Shiryo Plasma TV 50 inch YPT-50”. It is assumed that the user speaks various readings. The utterance master may be set with many possible readings for the value (S_WRITEFORM).

さらに、図６のように、各行には、読みの設定日付のフィールド（Ｄ＿ＲＥＧＩＳＴＥＲ）が設けられている。 Further, as shown in FIG. 6, each row is provided with a field (D_REGISTER) for reading setting date.

図７は、移動語マスタと呼ばれるテーブルの構成を示す図である。移動語マスタは、ユーザインターフェース部品それぞれに対応付けられる値のうち、移動語として利用される値を定義するテーブルである。移動語とは、その発話結果から変換された文字列が移動語マスタに値（Ｓ＿ＷＲＩＴＥＦＯＲＭ）として登録されていた場合、その値に対応するユーザインターフェース部品にポインティングデバイスのポインタが移動する。すなわち、そのユーザインターフェース部品が選択状態（フォーカスされた状態）となる。図７のように、移動語マスタは、値（Ｓ＿ＷＲＩＴＥＦＯＲＭ）、読み（Ｓ＿ＳＰＯＫＥＮＦＯＲＭ）、およびデータ登録日付（Ｄ＿ＲＥＧＩＳＴＥＲ）が組になって格納する。 FIG. 7 is a diagram showing a configuration of a table called a mobile word master. The movement word master is a table that defines values used as movement words among values associated with user interface components. When the character string converted from the utterance result is registered as a value (S_WRITEFORM) in the mobile word master, the pointer of the pointing device moves to the user interface component corresponding to the value. That is, the user interface component is selected (focused). As shown in FIG. 7, the mobile word master stores a value (S_WRITEFORM), a reading (S_SPOKENFORM), and a data registration date (D_REGISTER) in pairs.

図８は、予約語マスタを示す図である。予約語マスタは、システムが、サーバ１にて使用される前事前に予約された値（Ｓ＿ＷＲＩＴＥＦＯＲＭ）と読み（Ｓ＿ＳＰＯＫＥＮＦＯＲＭ）との関係を定義するテーブルである。 FIG. 8 is a diagram showing a reserved word master. The reserved word master is a table in which the system defines a relationship between a value (S_WRITEFORM) reserved in advance before being used in the server 1 and a reading (S_SPOKENFORM).

予約語マスタには、例えば、電話番号、ＦＡＸ番号等、市内局番等、使用頻度が高く、読み方がほとんど決まっている文字列について読みが定義される。 In the reserved word master, for example, readings are defined for character strings that are frequently used and are almost determined how to read, such as telephone numbers, FAX numbers, and local station numbers.

例えば、「今日」という文字列が入力されると、予約語マスタに存在する場合、クライアント２の日付を取得し、その日付を入力する。例えば、クライアント２の日付が２００５年１２月１２日で「明日」と発話した場合、本日の日付を取得し、１日加算し、「２００５/１２/１３」を入力する。（間の／は、図１０の属性マスタの定義によるものとする）。 For example, when a character string “today” is input, if it exists in the reserved word master, the date of the client 2 is acquired and the date is input. For example, when the date of the client 2 is December 12, 2005 and utters “Tomorrow”, the date of today is acquired, added by one day, and “2005/12/13” is input. (Between / is based on the definition of the attribute master in FIG. 10).

図９は、単位マスタを示す図である。図９のテーブルは文字列を定義する際の単位の一覧を表している。具体的にはＳ＿ＤＩＳＰＬＡＹは文字列の単位を表し、Ｓ＿ＡＴＴＲは図１０のＳ＿ＡＴＴＲとリンクされ、その文字列の書式属性を表している。
図１０は、属性マスタを示す図である。図９の単位マスタにより文字列の単位が定義され、図１０の属性マスタによりその文字列の表示属性が定義される。また、図１０の属性マスタは、図５のフィールドマスタとＳ＿ＦＯＲＭＡＴによりリンクされている。すなわち、各ユーザインターフェース部品に表示される文字列の表示書式は、フィールドマスタのＳ＿ＦＯＲＭＡＴを基に、図１０の属性マスタが検索され、決定される。 FIG. 9 is a diagram illustrating a unit master. The table of FIG. 9 represents a list of units when defining a character string. Specifically, S_DISPLAY represents a unit of a character string, and S_ATTR is linked with S_ATTR in FIG. 10 and represents a format attribute of the character string.
FIG. 10 shows an attribute master. The unit of the character string is defined by the unit master of FIG. 9, and the display attribute of the character string is defined by the attribute master of FIG. The attribute master in FIG. 10 is linked to the field master in FIG. 5 by S_FORMAT. That is, the display format of the character string displayed on each user interface component is determined by searching the attribute master of FIG. 10 based on the field master S_FORMAT.

＜実施例＞
図１１から図１５の図面により、本情報システムによる実施例を説明する。本実施例では、インターネットのショッピングサイトに対して音声認識機能を追加する例を説明する。 <Example>
An embodiment according to the information system will be described with reference to FIGS. In this embodiment, an example of adding a voice recognition function to an Internet shopping site will be described.

図１１は、サーバ１においてユーザインターフェース部品（フィールドともいう）に対して音声入力を対応付ける操作を示す図である。図１１には、サーバ１で実行される定義ツール（Voice Moderato Translator（商標））の操作画面（ウィンドウともいう）１０
が示されている。この操作画面１０は、画面の略左半分の領域にウェブページ表示部１１を有している。このウェブページ表示部１１には、音声認識機能を追加するユーザインターフェース、例えば、ウェブアプリケーションのウェブページが表示される。サーバ１のユーザ
が、例えば、音声認識機能を追加したいウェブページをウェブページ表示部１１にドラッグアンドドロップすることで、そのウェブページが表示される。 FIG. 11 is a diagram illustrating an operation of associating a voice input with a user interface component (also referred to as a field) in the server 1. FIG. 11 shows an operation screen (also referred to as a window) 10 of a definition tool (Voice Moderato Translator (trademark)) executed on the server 1.
It is shown. The operation screen 10 has a web page display unit 11 in a substantially left half area of the screen. The web page display unit 11 displays a user interface for adding a voice recognition function, for example, a web page of a web application. Server 1 user
However, for example, by dragging and dropping a web page to which a voice recognition function is to be added to the web page display unit 11, the web page is displayed.

また、画面の略右半分は、ウェブページ表示部１１に表示されたウェブページの解析結果および発話情報の設定領域となっている。すなわち、操作画面１０は、オブジェクト階層表示部１２、音声化対象ＵＲＬ表示部１３、認識語登録部１４、読み設定部１６のそれぞれの領域を有している。 The substantially right half of the screen is a setting area for analysis results and speech information of the web page displayed on the web page display unit 11. That is, the operation screen 10 has respective areas of an object hierarchy display unit 12, an audio target URL display unit 13, a recognition word registration unit 14, and a reading setting unit 16.

オブジェクト階層表示部１２は、処理対象に指定されたウェブページ、すなわち、ウェブページ表示部１１に表示されたウェブページを解析し、そのウェブページ上のユーザインターフェース部品（図１１では、オブジェクトともいう）の関係を階層的に表示する。一般的に、ユーザインターフェースは、ユーザインターフェース部品の階層的な組み合わせによって構成される。また、ユーザインターフェース部品は、複数の下位部品の階層的は組み合わせによって構成される。オブジェクト階層表示部１２は、処理対象のウェブページの階層構造を示す。 The object hierarchy display unit 12 analyzes a web page designated as a processing target, that is, a web page displayed on the web page display unit 11, and a user interface component on the web page (also referred to as an object in FIG. 11). Displays the relationship in a hierarchical manner. Generally, a user interface is configured by a hierarchical combination of user interface components. In addition, the user interface component is configured by a hierarchical combination of a plurality of lower components. The object hierarchy display part 12 shows the hierarchical structure of the web page to be processed.

例えば、ユーザインターフェースは、最上位にフォームと呼ばれるウィンドウ領域が定義され、フォーム上に、テキストボックス（テキスト入力フィールド）、プルダウンメニュ、チェックボタン等を配置して構成される。また、テキストボックスは、一般的には、タイトルを示すラベルと文字列入力フィールドを含む。また、プルダウンメニュは、タイトルを示すラベルと選択肢を示すリストと、リストを構成する要素の並びで構成される。 For example, the user interface is defined by defining a window area called a form at the top, and arranging a text box (text input field), a pull-down menu, a check button, and the like on the form. The text box generally includes a label indicating a title and a character string input field. The pull-down menu is composed of a label indicating a title, a list indicating options, and an arrangement of elements constituting the list.

音声化対象ＵＲＬ表示部１３は、音声認識機能を追加するウェブページを示すＵＲＬが、そのウェブページのタイトルとともに表示される。このＵＲＬは、例えば、ユーザがウェブページをウェブページ表示部１１にドラッグアンドドロップすることにより、定義ツールが取得する。タイトルは、ＵＲＬが示す定義ファイル（ＨＴＭＬ、ＸＭＬ等）から抽出される。 The voice-target URL display unit 13 displays a URL indicating a web page to which a voice recognition function is added together with the title of the web page. This URL is acquired by the definition tool, for example, when the user drags and drops a web page onto the web page display unit 11. The title is extracted from a definition file (HTML, XML, etc.) indicated by the URL.

認識語登録部１４は、処理対象のユーザインターフェース部品に、発話によって認識すべき文字列を対応付けて登録する。例えば、図１１では、ウェブページ上の「商品」というタイトルで示される箇所で、「商品選択」というタイトルのプルダウンメニュが操作されている。 The recognition word registration unit 14 registers a character string to be recognized by utterance in association with the user interface component to be processed. For example, in FIG. 11, a pull-down menu titled “Product Selection” is operated at a location indicated by the title “Product” on the web page.

このとき、オブジェクト階層表示部１２は、「商品選択」というプルダウンメニュが処理中であることが色（図１１上では黒く見える）で示され、認識語登録部１４には、タイトルが「商品」であり、データ型が「選択」すなわち、選択肢を含むユーザインターフェース部品であることが示される。 At this time, the object hierarchy display unit 12 indicates that the pull-down menu “product selection” is being processed in color (looks black in FIG. 11), and the recognition word registration unit 14 has the title “product”. It is indicated that the data type is “selection”, that is, a user interface component including options.

図１１のように、オブジェクト階層表示部１２は、移動語登録ボタン１５を有している。ユーザが移動語登録ボタンを押下すると、移動語登録画面が表示される。移動語登録ボタン１５は、タイトルに表示された文字列を移動語として設定するボタンである。 As shown in FIG. 11, the object hierarchy display unit 12 has a moving word registration button 15. When the user presses the moving word registration button, a moving word registration screen is displayed. The moving word registration button 15 is a button for setting a character string displayed in the title as a moving word.

図１２に移動語登録画面を示す。移動後登録画面には、移動語に設定する文字列とその読みが組となって表示される。例えば、「商品選択」（読み「しょうひんせんたく」）という文字列が移動語として登録されると、この処理対象のウェブページが表示されている状態で、「しょうひんせんたく」という音声が発話されると、「商品選択」のタイトルの付されたプルダウンメニュがフォーカス状態になる。 FIG. 12 shows a moving word registration screen. On the post-movement registration screen, a character string set for the movement word and its reading are displayed as a set. For example, if the character string “Product selection” (reading “Syohinsen taku”) is registered as a moving word, the sound “Shohinsen taku” will be displayed while the web page to be processed is displayed. When uttered, the pull-down menu with the title “product selection” is in focus.

図１２において、読み設定部１５には、処理対象のユーザインターフェース部品に設定すべき、または、ユーザインターフェース部品を操作するときに使用する文字列（入力文字）を定義する。ここでは、例えば、「商品選択」というプルダウンメニュのタイトルである文字列「商品選択」に対する発話音声「しょうひんせんたく」が定義される。また、プルダウンメニュの選択肢である、「コーヒー」に対する「こーひー」、「大豆」に対する「だいず」等が設定される。このような設定により、「しょうひんせんたく」が発話されると、「商品選択」というタイトルのプルダウンメニュがフォーカスされ、その状態で、「こーひー」と発話されると、「コーヒー」という選択肢が選択されることになる。すでに述べたように、入力文字に対して複数の読みを設定しても構わない。設定後、ユーザが、更新ボタン１７を押下すると、設定内容が、ＵＲＬともに、データベースに格納される。 In FIG. 12, the reading setting unit 15 defines a character string (input character) to be set for the user interface component to be processed or used when operating the user interface component. Here, for example, an utterance voice “shohinsen taku” for the character string “product selection” which is the title of the pull-down menu “product selection” is defined. In addition, “coffee” for “coffee”, “daiz” for “soybean”, and the like, which are pull-down menu options, are set. With this setting, when “Shohinsentaku” is spoken, the pull-down menu titled “Product selection” is focused, and when “Kohi” is spoken in that state, “Coffee” This choice is selected. As described above, a plurality of readings may be set for the input character. After the setting, when the user presses the update button 17, the setting content is stored in the database together with the URL.

ユーザは、以上のような設定をウェブページのそれぞれのユーザインターフェース部品に対して実行する。このような設定のなされたユーザインターフェース部品が音声認識の処理対象となる。 The user executes the above settings for each user interface component of the web page. The user interface component set as described above is a speech recognition processing target.

図１３に、インターネットショッピングを利用するエンドユーザのクライアント２上での処理例とこの処理に関係するクライアント２のアプリケーションプログラムを示す。 FIG. 13 shows an example of processing on the client 2 of an end user who uses Internet shopping and an application program of the client 2 related to this processing.

クライアント２には、すでに、ブラウザ２０、エンジン２１，音声文字変換ツール２２がインストールされている。また、ブラウザ２０およびエンジン２１は、クライアント２上で実行中であるとする。 In the client 2, a browser 20, an engine 21, and a voice character conversion tool 22 are already installed. Further, it is assumed that the browser 20 and the engine 21 are being executed on the client 2.

図１３では、ブラウザ２０は、インターネットショッピングサイトを表示している。このインターネットショッピングサイトの音声入力を定義するデータベースは、クライアント２が最初にインターネットショッピングサイトにアクセスしたときにダウンロードされる。また、例えば、エンジン２１をインストールするときに、最新のデータベースをサーバ１からダウンロードするようにしてもよい。 In FIG. 13, the browser 20 displays an Internet shopping site. The database defining the voice input of the Internet shopping site is downloaded when the client 2 first accesses the Internet shopping site. For example, the latest database may be downloaded from the server 1 when the engine 21 is installed.

エンジン２１は、起動されると常時、ブラウザ２０が表示するウェブページを示すＵＲＬを監視している。そして、エンジン２１は、ブラウザ２０が表示するＵＲＬがデータベースのＵＲＬマスタに登録されているか否かを判定する。そして、エンジン２１は、そのＵＲＬがデータベースのＵＲＬマスタに登録されていている場合、そのＵＲＬが音声認識処理の対象であると判定する。すると、エンジン２１は、そのＵＲＬで示される定義ファイル（ＨＴＭＬ、ＸＭＬ等）を読み出し、ブラウザ２０が表示するウェブページと同様の疑似画面を生成し、ブラウザ２０の表示に重畳して表示する。したがって、エンドユーザから見ると、あたかも、ブラウザ２０によってウェブページが表示されているように見える。 When the engine 21 is activated, the engine 21 monitors a URL indicating a web page displayed by the browser 20. Then, the engine 21 determines whether the URL displayed by the browser 20 is registered in the URL master of the database. If the URL is registered in the URL master of the database, the engine 21 determines that the URL is a target for voice recognition processing. Then, the engine 21 reads the definition file (HTML, XML, etc.) indicated by the URL, generates a pseudo screen similar to the web page displayed by the browser 20, and displays the pseudo screen superimposed on the display of the browser 20. Therefore, when viewed from the end user, it looks as if the web page is displayed by the browser 20.

この状態で、エンドユーザが音声を発話すると、その音声がマイクロホン、入出力インターフェースを通じて、音声データとしてクライアント２の実行する音声文字変換ツール２２に取り込まれる。音声文字変換ツール２２は、その音声データを音素分析し、音声データをＡＳＣＩＩコード列に変換する。さらに、音声文字変換ツール２２は、辞書を検索し、ＡＳＣＩＩコード列を単語（または形態素）に分解し、辞書と照合する。そして、音声文字変換ツール２２は、単語（または形態素）の並びであるテキストを生成し、引数を通じてエンジン２１に引き渡す。 In this state, when the end user speaks a voice, the voice is taken into the voice character conversion tool 22 executed by the client 2 as voice data through the microphone and the input / output interface. The phonetic character conversion tool 22 performs phoneme analysis on the phonetic data and converts the phonetic data into an ASCII code string. Further, the phonetic character conversion tool 22 searches the dictionary, decomposes the ASCII code string into words (or morphemes), and collates with the dictionary. Then, the phonetic character conversion tool 22 generates a text that is a sequence of words (or morphemes) and passes it to the engine 21 through an argument.

エンジン２１は、テキスト中の単語（または形態素）からデータベースの予約語マスタを検索し、発話された音声に該当する入力文字とその入力文字を入力すべきユーザインターフェース部品を決定する。あるいは、移動語マスタを検索して、発話された音声によって選択対象とすべきユーザインターフェース部品を決定する。あるいは、発話マスタおよびフィールドマスタを検索し、発話された音声に該当する入力文字とその入力文字を入力すべきユーザインターフェース部品を決定する。そして、その入力文字を該当するユーザインターフェース部品に設定し、表示装置（ディスプレイ）上のウェブページの疑似画面に表示する。 The engine 21 searches a reserved word master in the database from words (or morphemes) in the text, and determines an input character corresponding to the spoken voice and a user interface component to which the input character is to be input. Alternatively, the mobile word master is searched, and the user interface component to be selected is determined based on the spoken voice. Alternatively, the utterance master and the field master are searched to determine an input character corresponding to the spoken voice and a user interface component to which the input character is to be input. Then, the input character is set in the corresponding user interface component, and is displayed on the pseudo screen of the web page on the display device (display).

図１４に、音声入力によって設定されたウェブページの例を示す。例えば、エンジン２１が起動中に、エンドユーザが図１４のウェブページをブラウザで表示すると、エンジン２１は、そのＵＲＬがデータベースのＵＲＬマスタに登録されていることを検知する。そして、エンジン２１は、そのＵＲＬによりウェブページの構成を読みとり、ウェブページの疑似画面をブラウザに重畳して表示する。 FIG. 14 shows an example of a web page set by voice input. For example, when the end user displays the web page of FIG. 14 with the browser while the engine 21 is running, the engine 21 detects that the URL is registered in the URL master of the database. The engine 21 reads the configuration of the web page based on the URL, and displays the web page pseudo screen superimposed on the browser.

そして、例えば、エンドユーザが「ちゅうもんないよう」と発話すると、疑似画面中の「注文内容」部分がフォーカスされる。ここで、例えば、「うけつけばんごうはいちにさんし」と発話すると、音声文字変換ツールによって「うけつけばんごう」「は」「いちにさんし」に変換される。エンジン２１は、「うけつけばんごう」によって、発話マスタを検索し、「うけつけばんごう」を「受付番号」に変換する。さらに、エンジン２１は、フィールドマスタを検索し、フィールド「受付番号」を決定し、そのフィールドを識別する情報（図５のＩ＿ＦＩＥＬＤとＩ＿ＶＡＬＵＥの値）を取得する。また、エンジン２１は、「は」の後の「いちにさんし」によって「１２３４」を決定し、「受付番号」のフィールドに「１２３４」を設定する。 Then, for example, when the end user speaks “Let's do nothing”, the “order contents” portion in the pseudo screen is focused. Here, for example, when “Uketsubango is Ichini-sanshi” is uttered, it is converted into “Uketsu-bangogo”, “ha”, and “Ichini-sanshi” by the phonetic character conversion tool. The engine 21 searches for the utterance master by “invoke bango”, and converts “invoke bango” into “acceptance number”. Further, the engine 21 searches the field master, determines the field “reception number”, and obtains information for identifying the field (values of I_FIELD and I_VALUE in FIG. 5). Further, the engine 21 determines “1234” based on “Ichi ni sanshi” after “ha”, and sets “1234” in the field of “reception number”.

また、例えば、エンドユーザが「こーひー」と発話すると、音声文字変換ツールによっ
て「こーひー」に変換される。エンジン２１は、「こーひー」を基に、発話マスタ（図６）を参照し、文字列「コーヒー」を取得する。次に、エンジン２１は、「コーヒー」を基に、ウェブページに対応する（Ｌ＿ＩＤでＵＲＬマスタとリンクされる）フィールドマスタ（図５）を参照し、「コーヒー」を設定すべきユーザインターフェース部品（図５のＩ＿ＦＩＥＬＤとＩ＿ＶＡＬＵＥの値で識別される）と、そのユーザインターフェース部品が表示されるウェブページのＵＲＬ（図５のＬ＿ＩＤの値によって定まる図４のＵＲＬマスタの行のＵＲＬ）を決定する。 Also, for example, when the end user speaks “Kohi”, it is converted into “Kohi” by the phonetic character conversion tool. The engine 21 refers to the utterance master (FIG. 6) on the basis of “Kohi” and acquires the character string “coffee”. Next, the engine 21 refers to the field master (FIG. 5) corresponding to the web page (linked to the URL master by L_ID) based on “coffee”, and sets the user interface component (“coffee”) ( 5 and the URL of the web page on which the user interface component is displayed (the URL in the URL master line in FIG. 4 determined by the L_ID value in FIG. 5).

図１４は、郵便番号と電話番号の入力例を示す図である。郵便番号に関しては例えば、エンドユーザが「ゆうびんばんごういちにさんのよんごろくなな」と発話すると、音声文字変換ツールによって、「ゆうびんばんごう」「いちにさん」「の」「よんごろくなな」に変換される。エンジン２１は、「ゆうびんばんごう」によって予約後マスタ（または、発話マスタ）を検索し、「郵便番号」を検知する。さらに、エンジン２１は、フィールドマスタの属性を検索し、フィールド「郵便番号」を決定する。また、フィールド「郵便番号」が複数存在する場合、ウェブページの疑似画面の現在位置をＨＴＭＬファイルあるいはＸＭＬファイル等のウェブページを構成するファイルから取得し、その位置以降の最初のフィールド「郵便番号」に決定する。 FIG. 14 is a diagram illustrating an example of inputting a zip code and a telephone number. As for postal codes, for example, when an end user speaks “Yubinbangouichi-san no Yongorokuna”, the text-to-speech converter uses the “Yubinbango”, “Ichini-san”, “No”, “Yongo-kunna” Is converted to. The engine 21 searches the post-reservation master (or utterance master) by “Yubinbango”, and detects “zip code”. Further, the engine 21 searches the attribute of the field master and determines the field “zip code”. If there are a plurality of fields “zip code”, the current position of the pseudo page of the web page is acquired from a file constituting the web page such as an HTML file or an XML file, and the first field “zip code” after the position is acquired. To decide.

また、後の「いちにさん」「の」「よんごろくなな」を取得し、数字以外の文字を除外し、数字の羅列を生成する。フィールドマスタで取得した属性を基に書式変換し、「１２３‐４５６７」をフィールド「郵便番号」に設定する。 Also, the later “Ichini-san”, “No”, and “Yongorokuna” are acquired, characters other than numbers are excluded, and a list of numbers is generated. The format is converted based on the attribute acquired by the field master, and “123-4567” is set in the field “zip code”.

電話番号に関しては例えば、エンドユーザが「でんわばんごういちにのさんよんごろくのななはちきゅうぜろ」と発話すると、音声文字変換ツールによって、「でんわばんごう」「いちに」「の」「さんよんごろく」「の」「ななはちきゅうぜろ」に変換される。エンジン２１は、「でんわばんごう」によって予約後マスタ（または、発話マスタ）を検索し、「電話番号」に変換する。さらに、エンジン２１は、フィールドマスタの属性を検索し、フィールド「電話番号」を決定する。また、フィールド「電話番号」が複数存在する場合、ウェブページの疑似画面の現在位置を取得し、その位置以降の最初のフィールド「電話番号」に決定する。また、後の「いちに」「の」「さんよんごろく」「の」「ななはちきゅうぜろ」を取得し、数字以外の文字を除外し、数字の羅列を生成する。フィールドマスタで取得した属性を基に書式変換し、「１２‐３４５６−７８９０」をフィールド「電話番号」に設定する。 As for phone numbers, for example, when an end user utters “Denbanbangoichi no Sanyokuronoku no Nanachi Kyusero”, the phonetic conversion tool will be used to create “Denbanbango” “Ichini” “No”. It is converted to “San Yongoroku”, “No”, and “Nana Hachikyuzuro”. The engine 21 searches the post-reservation master (or utterance master) by “Denwa Bango”, and converts it to “phone number”. Further, the engine 21 searches the field master attribute to determine the field “telephone number”. If there are a plurality of fields “telephone numbers”, the current position of the pseudo screen of the web page is acquired, and the first field “phone number” after that position is determined. In addition, the subsequent “ichini”, “no”, “sanyongoroku”, “no”, and “nanahachikyuzero” are acquired, and characters other than numbers are excluded to generate an enumeration of numbers. The format is converted based on the attribute acquired by the field master, and “12-3456-7890” is set in the field “telephone number”.

＜実施形態の効果＞
以上述べたように、本実施形態の情報システムによれば、サーバ１の発話定義ツールは、ユーザインターフェースを構成ウェブページを解析し、そのウェブページを構成するユーザインターフェース部品の階層構造を抽出する。そして、発話定義ツールは、各ユーザインターフェース部品に対応する読みの入力文字を受け付け、各ユーザインターフェース部品の属するＵＲＬおよび各ユーザインターフェース部品を識別する識別情報（図５のＬ＿ＩＤ、Ｉ＿ＦＩＥＬＤ、Ｉ＿ＶＡＬＵＥ等の値）とともにデータベースに格納する。 <Effect of embodiment>
As described above, according to the information system of the present embodiment, the utterance definition tool of the server 1 analyzes the web page constituting the user interface and extracts the hierarchical structure of the user interface parts constituting the web page. Then, the utterance definition tool receives input characters of reading corresponding to each user interface component, and the URL to which each user interface component belongs and identification information for identifying each user interface component (values such as L_ID, I_FIELD, and I_VALUE in FIG. 5). ) And store it in the database.

一方、エンドユーザが使用するクライアント２は、サーバ１から各ウェブページのユーザインターフェース部品対して発話による入力文字が定義されたデータベースをダウンロードしておく。そして、クライアント２で実行されるエンジン２１が、ブラウザの表示するウェブページに重畳して疑似画面を生成し、重畳して表示する。この状態で、エンドユーザが音声を発話すると、音声文字変換ツールを通じて得られた単語（あるいは形態素）を含むテキストから該当するユーザインターフェース部品が決定され、そのユーザインターフェース部品に応じた処理が実行される。例えば、そのユーザインターフェース部品がテキスト入力フィールドのタイトル、プルダウンメニュのタイトルである場合、そのユー
ザインターフェース部品がフォーカスされた状態になる。また、そのテキストの該当部分がテキスト入力フィールドに設定される。また、そのそのテキストの該当部分がプルダウンメニュの選択肢である場合は、その選択肢が選択される。 On the other hand, the client 2 used by the end user downloads from the server 1 a database in which input characters by utterances are defined for the user interface components of each web page. Then, the engine 21 executed by the client 2 generates a pseudo screen by superimposing the web page displayed by the browser, and displays the pseudo screen. In this state, when the end user utters a voice, a corresponding user interface component is determined from text including a word (or morpheme) obtained through the phonetic character conversion tool, and processing corresponding to the user interface component is executed. . For example, when the user interface component is the title of a text input field or the title of a pull-down menu, the user interface component is in a focused state. In addition, the corresponding part of the text is set in the text input field. Also, if the corresponding part of the text is a pull-down menu option, that option is selected.

このように、本情報システムによれば、ウェブページを定義する定義ファイル（ＨＴＭＬ、ＸＭＬ、ＪＳＰ、ＩＩＳ等）、およびそのウェブページを構成するアプリケーションプログラムを変更することなく、ウェブ上のユーザインターフェースに音声認識機能を追加することができる。 Thus, according to this information system, a definition file (HTML, XML, JSP, IIS, etc.) that defines a web page and an application program that configures the web page can be changed to a user interface on the web. A voice recognition function can be added.

＜変形例＞
上記実施形態では、主として、ネットワーク上のサーバ１とクライアント２とを含む情報システムにおいて、音声認識機能を追加する例を示した。しかし、本発明の実施は、このような構成には限定されない。例えば、スタンドアロンのコンピュータにおいて、発話定義ツールとエンジン２１の両方を搭載してもよい。すなわち、スタンドアロンのコンピュータにおいて、発話定義ツールによって構築されたデータベースを使用し、そのコンピュータ上で表示されるウェブページに音声入力するようにしてもよい。 <Modification>
In the above-described embodiment, the example in which the voice recognition function is added mainly in the information system including the server 1 and the client 2 on the network is shown. However, the implementation of the present invention is not limited to such a configuration. For example, both the utterance definition tool and the engine 21 may be installed in a stand-alone computer. That is, in a stand-alone computer, a database constructed by an utterance definition tool may be used, and voice input may be performed on a web page displayed on the computer.

また、発話定義ツールとエンジン２１とを一体化プログラムとして、エンドユーザに配布してもよい。その場合には、エンドユーザが、利用したいウェブアプリケーション等のウェブページ上のユーザインターフェース部品に、発話による文字列を関連付けてデータベースに登録すればよい。そして、エンドユーザ自身が設定したデータベースの定義を利用して、そのウェブページに音声入力すればよい。 Further, the utterance definition tool and the engine 21 may be distributed to end users as an integrated program. In that case, the end user may register a character string based on an utterance in a database in association with a user interface component on a web page such as a web application to be used. Then, using the database definition set by the end user himself / herself, voice input may be performed on the web page.

また、上記実施形態では、ウェブページ上に表示されるユーザインターフェースに音声入力機能を追加する例を示した。しかし、本発明の実施は、ウェブページ上のユーザインターフェース部品には限定されない。すなわち、ＨＴＭＬファイル、あるいは、ＸＭＬファイル以外であっても、画面上のユーザインターフェース部品の構造、あるいは、そのユーザインターフェース部品のタイトルを示す文字列、入力すべき文字列を定義ツールおよびエンジン２１のような外部プログラム（音声入力の対象となるアプリケーション以外のプログラム）が特定可能な場合には、本発明の実施が可能である。 Moreover, in the said embodiment, the example which adds a voice input function to the user interface displayed on a web page was shown. However, implementation of the present invention is not limited to user interface components on web pages. That is, even if the file is not an HTML file or an XML file, the structure of the user interface part on the screen, the character string indicating the title of the user interface part, and the character string to be input are defined as in the definition tool and the engine 21. When an external program (a program other than an application that is a target of voice input) can be specified, the present invention can be implemented.

例えば、スタンドアロンのコンピュータにおいて、ワードプロセッサ、表計算プログラム、プレゼンテーションツール等の文書作成プログラムのマクロ定義情報からその文書作成プログラム上に構成したユーザインターフェースの構成を解析するようにしてもよい。 For example, in a stand-alone computer, the configuration of the user interface configured on the document creation program may be analyzed from the macro definition information of the document creation program such as a word processor, a spreadsheet program, or a presentation tool.

例えば、発話定義ツールは、マクロ定義情報を読み取り、上記実施形態と同様にデータベースを構築すればよい。そして、そのデータベースの提供を受けたエンジン２１が、文書作成プログラムの起動を監視し、文書作成プログラムが起動されたときに、その起動された文書作成プログラムの擬似プロセスを実行すればよい。そして、擬似プロセスの生成するユーザインターフェース画面が、本来の文書作成プログラムのユーザインターフェースに重畳して表示されるようにすればよい。このような準備の後は、上記実施形態と同様と、音声入力結果を擬似プロセスのユーザインターフェースに設定すればよい。 For example, the utterance definition tool may read the macro definition information and construct a database as in the above embodiment. Then, the engine 21 that receives the provision of the database monitors the activation of the document creation program, and when the document creation program is activated, a pseudo process of the activated document creation program may be executed. The user interface screen generated by the pseudo process may be displayed so as to be superimposed on the original user interface of the document creation program. After such preparation, the voice input result may be set in the user interface of the pseudo process as in the above embodiment.

また、例えば、アプリケーションプログラムが、ユーザインターフェースプログラムと処理プログラムとから構成され、ユーザインターフェースプログラムと処理プログラムとがプロセス間通信で通信するような場合には、スタンドアロンの環境で、上記実施形態と同様に、ユーザインターフェースプログラムの画面に重畳して擬似画面を生成、音声入力機能を付加できる。音声入力された結果を文字列に変換し、擬似プロセスのユーザインターフェースプログラムの画面に設定し、プロセス間通信で処理プログラムに引き渡すようにすればよい。 Further, for example, when the application program includes a user interface program and a processing program, and the user interface program and the processing program communicate with each other through inter-process communication, in a stand-alone environment, the same as in the above-described embodiment. A pseudo screen can be generated by superimposing on the screen of the user interface program, and a voice input function can be added. The voice input result may be converted into a character string, set on the screen of the user interface program of the pseudo process, and passed to the processing program by inter-process communication.

また、例えば、ウィンドウ上の部品の構成をリソースファイルとしてバイナリプログラム外に定義しておくアプリケーションにおいては、そのリソースファイルを解析することによって、ユーザインターフェース部品を抽出し、音声入力機能を付加することができる。 In addition, for example, in an application in which the configuration of parts on a window is defined as a resource file outside a binary program, a user interface part can be extracted and a voice input function can be added by analyzing the resource file. it can.

なお、エンジン２１、発話定義ツール等のコンピュータ実行可能なプログラムは、ネットワークを通じて、クライアント２、あるいはサーバ１にインストールするようにしてもよい。これらのプログラムをコンピュータが読み取り可能な記録媒体（例えば、ＤＶＤ，ＣＤ−ＲＯＭ、着脱可能なディスク等）に格納して配布するようにしてもよい。また、エンジン２１、発話定義ツール等のコンピュータ実行可能なプログラムをネットワーク上のアプリケーションサービスを提供するサーバに格納しておき、プログラムの機能だけをサーバ１、あるいはクライアント２に提供してもよい。 Computer-executable programs such as the engine 21 and the speech definition tool may be installed in the client 2 or the server 1 through a network. These programs may be stored in a computer-readable recording medium (for example, DVD, CD-ROM, removable disk, etc.) and distributed. Alternatively, computer-executable programs such as the engine 21 and the utterance definition tool may be stored in a server that provides application services on the network, and only the functions of the program may be provided to the server 1 or the client 2.

本発明の一実施の形態に係る情報システムの構成図である。It is a block diagram of the information system which concerns on one embodiment of this invention. サーバ上の処理を示すフローチャートである。It is a flowchart which shows the process on a server. 音声認識機能が付加されたユーザインターフェースを利用するクライアント側の処理を示すフローチャートである。It is a flowchart which shows the process by the side of the client using the user interface to which the speech recognition function was added. ＵＲＬマスタの構成を示す図である。It is a figure which shows the structure of URL master. フィールドマスタの構成を示す図である。It is a figure which shows the structure of a field master. 発話マスタの構成を示す図である。It is a figure which shows the structure of an utterance master. 移動語マスタの構成を示す図である。It is a figure which shows the structure of a movement word master. 予約語マスタを示す図である。It is a figure which shows a reserved word master. 単位マスタを示す図である。It is a figure which shows a unit master. 属性マスタを示す図である。It is a figure which shows an attribute master. サーバにおいてユーザインターフェース部品に対して音声入力を対応付ける操作を示す図である。It is a figure which shows operation which matches audio | voice input with respect to user interface components in a server. 移動語登録画面を示す図である。It is a figure which shows a movement word registration screen. インターネットショッピングを利用するエンドユーザのクライアント上での処理例を示す図である。It is a figure which shows the example of a process on the client of the end user using Internet shopping. 音声入力によって設定されたウェブページの例を示す図である。It is a figure which shows the example of the web page set by audio | voice input. 郵便番号と電話番号の入力例を示す図である。It is a figure which shows the input example of a postal code and a telephone number.

Explanation of symbols

１サーバ
２クライアント
１０操作画面
１１ウェブページ表示部
１２オブジェクト階層表示部
１３音声化対象ＵＲＬ表示部
１４認識語登録部
１５移動語登録ボタン
１６読み設定部
２０ブラウザ
２１エンジン
２２音声文字変換ツール DESCRIPTION OF SYMBOLS 1 Server 2 Client 10 Operation screen 11 Web page display part 12 Object hierarchy display part 13 Voice object URL display part 14 Recognition word registration part 15 Moving word registration button 16 Reading setting part 20 Browser 21 Engine 22 Spoken character conversion tool

Claims

Computer
Means for retrieving component information defining the user interface component from screen display definition information defining a display mode of a first screen portion configured on a computer screen including a user interface component;
Means for receiving input of character string information corresponding to the user interface component, which is a target specified by utterance;
Means for storing the character string information in the utterance component table in association with the component information;
A computer-executable program that functions as a computer.

The computer-executable program according to claim 1, further causing the computer to function as means for receiving a designation for processing the first screen portion.

Means for displaying a second screen portion including the user interface component;
The computer execution according to claim 1 or 2, further comprising: a computer functioning as processing means for identifying a user interface component corresponding to character string information converted by receiving an utterance and executing processing according to the user interface component. Possible program.

The processing includes displaying a screen portion including a user interface component corresponding to the character string information, changing a hierarchical relationship on the display between the screen portion and another screen portion, and changing the user interface component corresponding to the character string information. The computer-executable program according to claim 3, wherein the computer-executable program is selection or setting of the character string information in the user interface component.

The computer further functions as means for providing information of the speech component table to another computer having means for displaying the first and second screen portions based on the screen display definition information via communication means. The computer-executable program according to any one of claims 2 to 4.

The character string information converted from the utterance is sent to the user interface component to another computer having means for displaying the first and second screen parts based on the screen display definition information via communication means. 6. The computer-executable program according to claim 2, further causing the computer to function as means for distributing the computer program including means for associating.

The computer-executable program according to claim 5 or 6, further causing the computer to function as means for receiving character string information converted from an utterance and set in the user interface component from the other computer.

Computer
Retrieving part information defining the user interface part from screen display definition information defining a display mode of a first screen part configured on a computer screen including a user interface part;
Receiving character string information corresponding to the user interface component, which is a target specified by utterance;
Storing the character string information in an utterance component table in association with the component information.

Means for retrieving component information defining the user interface component from screen display definition information defining a display mode of a first screen portion configured on a computer screen including a user interface component;
Means for receiving input of character string information corresponding to the user interface component, which is a target specified by utterance;
An information processing apparatus comprising: means for storing the character string information in an utterance component table in association with the component information.

On the computer,
Means for retrieving screen display definition information that includes a user interface component defined by the component information and defines a display mode of the first screen portion configured on the computer screen;
Means for displaying a second screen portion including the user interface component;
Means for storing an utterance component table that associates character string information specified by utterance with the component information;
Means for acquiring character string information generated by accepting an utterance;
Means for identifying a user interface component corresponding to the generated character string information;
A computer-executable program that functions as processing means for executing processing according to the user interface component.

The computer-executable program according to claim 10, wherein the second screen portion is configured to overlap with the first screen portion displayed according to the screen display definition information.

The processing includes displaying a screen portion including a user interface component corresponding to the character string information, changing a hierarchical relationship on the display between the screen portion and another screen portion, and changing the user interface component corresponding to the character string information. The computer-executable program according to claim 10 or 11, which is a selection or setting of the character string information in the user interface component.

The computer-executable program according to any one of claims 10 to 12, further causing the computer to function as means for receiving provision of information of the utterance part table from another information processing apparatus via communication means.

Means for retrieving screen display definition information that includes a user interface component defined by the component information and defines a display mode of the first screen portion configured on the computer screen;
Means for displaying a second screen portion including the user interface component;
Means for storing an utterance component table that associates character string information specified by utterance with the component information;
Means for acquiring character string information generated by accepting an utterance;
Means for identifying a user interface component corresponding to the generated character string information;
An information processing apparatus comprising: processing means for executing processing according to the user interface component.