JP2008096489A

JP2008096489A - Voice system, voice method, voice server, and voice program

Info

Publication number: JP2008096489A
Application number: JP2006274814A
Authority: JP
Inventors: Ichiro Uratani; 一郎裏谷; Hiroki Miyake; 洋樹三宅
Original assignee: Pentax Corp
Current assignee: Pentax Corp
Priority date: 2006-10-06
Filing date: 2006-10-06
Publication date: 2008-04-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice generating system, a voice generating method, a voice generating server and a voice generating program, for obtaining a text as voice data resulted from reading at least a part of word included in the text, providing a voice data in such a way that the correlation of a sentence and voice for the reading thereof is easily referred. <P>SOLUTION: The word included in the text is extracted. Based on the extracted word and level information of a user, the word with a difficulty level more than a user's level is extracted, and the voice data obtained by reading the extracted word is extracted, and an HTML file is created by embedding a link anchor to the voice data in the text, and transmitted to a terminal. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、テキストを、テキストに含まれる単語の少なくとも一部を読み上げた音声データとして取得するための音声化システム、音声化方法、音声化サーバ及び音声化プログラムに関する。 The present invention relates to a speech system, a speech method, a speech server, and a speech program for acquiring text as speech data obtained by reading out at least a part of words included in the text.

外国語、特に日本語のように表意文字に固有語の発音を割り当てているもの（所謂訓読み）や、英語のように複数の言語から単語及びその発音規則を借用しているもの、或いはロシア語のようにアクセントの位置によって母音の発音が変化するようなものを習得しようとする際には、その言語を習得しようとする人（以下習得者と称す）にとって身近な文書（例えば習得者に取って興味のある分野のニュース記事や、技術文書）と、その文書を読み上げた音声とを相互参照しながら学習を進めていくことが有効である。 Foreign languages, especially those that assign unique words to ideographs such as Japanese (so-called cautionary readings), those that borrow words and their pronunciation rules from multiple languages, such as English, or Russian When trying to learn something that changes the pronunciation of the vowels depending on the position of the accent, such as It is effective to proceed with learning while cross-referencing the news articles and technical documents of the field of interest and the voices that read the documents.

ここで、習得者にとって興味のある分野は習得者ごとに異なり、多岐の分野に渡っている。そこで、インターネットを介して公衆に開示されている各種ニュース記事や技術文書等の文書を取得し、これを読み上げた音声データを生成し、この音声データと文書とを相互参照しながら学習を進めていくことが考えられる。このような学習を可能とするシステムとして、特許文献１のようなものがある。
特開２００５−７０３０４ Here, the fields of interest to the learners differ from learner to learner, and they cover a wide variety of fields. Therefore, various news articles and technical documents that are disclosed to the public via the Internet are acquired, voice data is read aloud, and learning is performed while cross-referencing the voice data and the document. It is possible to go. As a system that enables such learning, there is a system as described in Patent Document 1.
JP-A-2005-70304

特許文献１には、ゲートウェイサーバ型の音声読み上げサーバが開示されている。すなわち、インターネット上で公開されている文書を読み上げた音声データの取得を希望する場合は、音声読み上げサーバの利用者はインターネットに接続されている端末（ＰＣなど）でウェブブラウザなどのユーザエージェントを実行し、このユーザエージェントを操作して音声読み上げサーバにＨＴＴＰリクエストを送信する。この時、このＨＴＴＰリクエストには、音声データの取得を希望する文書を示すＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）が含まれる。 Patent Document 1 discloses a gateway server type speech-to-speech server. In other words, when it is desired to obtain voice data read out from a document published on the Internet, the user of the voice reading server executes a user agent such as a web browser on a terminal (such as a PC) connected to the Internet. Then, the user agent is operated to transmit an HTTP request to the voice reading server. At this time, the HTTP request includes a URL (Uniform Resource Locator) indicating a document for which acquisition of audio data is desired.

音声読み上げサーバは、このＵＲＬに対応する文書を取得し、次いでこの文書からテキスト部分のみを抜き出す。例えば、文書がＨＴＭＬで記述されたものであるなら、タグやコメント、ＳＧＭＬ宣言などを除去したテキスト部分のみを取り出すことになる。音声読み上げサーバは、この抜き出されたテキスト部分を読み上げた音声データを音声合成等を使用して生成する。最後に、音声読み上げサーバはこの音声データそのもの、或いはこの音声データのＵＲＬをＨＴＴＰリクエストに対するレスポンスとして送信する。かくして、使用者は音声データを取得し、文書とこの文書を読み上げた音声の双方を参照可能となる。 The voice reading server acquires a document corresponding to this URL, and then extracts only the text portion from this document. For example, if the document is described in HTML, only the text part from which tags, comments, SGML declarations, etc. are removed is extracted. The speech reading server generates speech data that reads out the extracted text portion by using speech synthesis or the like. Finally, the voice reading server transmits the voice data itself or the URL of the voice data as a response to the HTTP request. Thus, the user can acquire the audio data and can refer to both the document and the audio read out from the document.

上記の構成は、インターネット上で公開されている任意の文書を読み上げた音声データを取得するものである。上記の構成においてはある文書について、その文書全体、或いはその文書のまとまった一部分（１段落、１頁など）を読み上げた音声データが取得されるものである。その言語をネイティブ言語としない言語学習者にとって、学習の際に重要となるのは、特定の語がどのように発音されるかである。しかしながら、上記構成においては、ある程度まとまった文章単位で読み上げが行われるので、特にその言語をネイティブ言語としていないものにとって、いま読み上げられているのが文書中のどの部分であるかを判別するのは容易ではないケースも多い。すなわち、特許文献１の構成は、自然言語の学習という観点からは、上記の理由から必ずしも優れたものとはいえなかった。 The configuration described above is to acquire voice data read out from an arbitrary document published on the Internet. In the above configuration, audio data obtained by reading out the entire document or a part of the document (one paragraph, one page, etc.) is acquired. For language learners who do not make their language a native language, what is important in learning is how a particular word is pronounced. However, in the above configuration, reading is performed in units of a certain amount of sentences, so it is particularly difficult to determine which part of the document is being read out, especially for those whose language is not a native language. There are many cases that are not easy. That is, the configuration of Patent Document 1 is not necessarily excellent for the above reason from the viewpoint of learning natural language.

本発明は上記の問題に鑑みてなされたものであり、言語習得者にとって利用しやすい、すなわち文書とそれを読み上げた音声との相互参照が容易であるような形で音声データを提供可能な音声化システム、音声化方法、音声化サーバ及び音声化プログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and is an audio that can provide audio data in a form that is easy for a language learner to use, that is, a cross-reference between a document and the audio that is read out is easy. An object is to provide an audio system, an audio method, an audio server, and an audio program.

上記の目的を解決するため、本発明においては、テキストに関するテキスト情報及び利用者のレベル情報を端末から受信し、受信したテキスト情報に基づいてテキストを取得し、レベル情報に基づいて取得したテキストに含まれる単語の中から使用者のレベル以上の読み上げ難易度の単語を抽出し、抽出された単語を読み上げた音声データを取得し、テキストに該音声データへのリンクアンカーを埋め込んでＨＴＭＬファイルを作成し、作成されたＨＴＭＬファイルを該端末に送信する。 In order to solve the above object, in the present invention, text information related to text and user level information are received from the terminal, the text is acquired based on the received text information, and the text acquired based on the level information is obtained. Extract words that are difficult to read from the user's level from among the included words, acquire voice data that reads the extracted words, and create an HTML file by embedding the link anchor to the voice data in the text Then, the created HTML file is transmitted to the terminal.

従って、本発明の構成によれば、システムの利用者である言語学習者の習得レベルに応じた単語のみに対する音声データが作成される。加えて、利用者は読み上げを希望するテキストにこの音声データへのリンクアンカーが埋め込まれたＨＴＭＬファイルを受けとることになる。このＨＴＭＬファイルを開くと、文書の所々の単語にリンクアンカーが割り当てられたハイパーテキスト文書が表示され、利用者はこのリンクアンカーを操作してその単語の音声データを取得・再生することによって、特定の単語の発音を理解することができる。 Therefore, according to the configuration of the present invention, voice data for only words according to the acquisition level of a language learner who is a user of the system is created. In addition, the user receives an HTML file in which a link anchor to the voice data is embedded in text desired to be read out. When this HTML file is opened, a hypertext document in which link anchors are assigned to various words in the document is displayed, and the user operates the link anchors to acquire and play back the voice data of the words. Can understand the pronunciation of words.

好ましくは、テキスト情報とは、テキストのＵＲＬを含むか、テキスト自身を含む。また、抽出された単語を読み上げた音声データを、音声合成によって生成する構成としてもよい。また、所定のネットワークは、例えばインターネットである。 Preferably, the text information includes a text URL or the text itself. Moreover, it is good also as a structure which produces | generates the speech data which read the extracted word by speech synthesis. The predetermined network is, for example, the Internet.

以上のように、本発明によれば、文書とそれを読み上げた音声との相互参照が容易であるような形で音声データを提供可能な音声化システムが実現される。 As described above, according to the present invention, an audio system that can provide audio data in a form that facilitates cross-referencing between a document and the audio read out is realized.

以下、本発明の実施の形態に付き、図面を参照して詳細に説明する。図１は、本実施形態による音声化システムの全体を示す概念図である。本実施形態においては、音声化システム１は、音声化サーバ１００と端末２００とを備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a conceptual diagram showing the entire voice system according to the present embodiment. In the present embodiment, the voice system 1 includes a voice server 100 and a terminal 200.

端末２００は、ダイヤルアップ接続やｘＤＳＬ接続などによってインターネットに接続可能な端末である。また、端末２００は、例えばウェブブラウザを実行可能なＰＣであり、音声化システム１の利用者は、この端末２００上でウェブブラウザを実行・操作して、音声化サーバ１００より所望の文書の単語の音声データを得る。端末２００は、音声データを再生するためのデバイス（ＰＣＭ音源及び、スピーカまたはヘッドホン）を備えており、音声化サーバ１００から得た音声データを再生することが可能である。 The terminal 200 is a terminal that can be connected to the Internet by dial-up connection, xDSL connection, or the like. The terminal 200 is, for example, a PC that can execute a web browser, and the user of the speech system 1 executes and operates the web browser on the terminal 200 to make a word of a desired document from the speech server 100. Get voice data. The terminal 200 includes a device (PCM sound source and a speaker or headphones) for reproducing audio data, and can reproduce the audio data obtained from the audio server 100.

音声化サーバ１００は、インターネットに接続されている音声化ゲートウェイサーバ１１０と、このゲートウェイサーバ１１０とＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）経由で接続されているレベル格納データベース１２０及びＴＴＳ（Ｔｅｘｔ−Ｔｏ−Ｓｐｅｅｃｈ）サーバ１３０を有する。 The voice server 100 includes a voice gateway server 110 connected to the Internet, a level storage database 120 and a TTS (Text-To-Speech) server connected to the gateway server 110 via a LAN (Local Area Network). 130.

音声化ゲートウェイサーバ１１０は、ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）に基づいて端末２００との間でデータの送受信が可能な、一種のＷｅｂサーバである。また、音声化ゲートウェイサーバ１１０は、ＨＴＴＰユーザエージェントとしての機能をも有しており、端末２００からのリクエストに基づいて、インターネット上の他のウェブサーバ３００から文書データを取得することができる。 The voice gateway server 110 is a type of Web server capable of transmitting and receiving data to and from the terminal 200 based on HTTP (HyperText Transfer Protocol). The voice gateway server 110 also has a function as an HTTP user agent, and can acquire document data from another web server 300 on the Internet based on a request from the terminal 200.

レベル格納データベース１２０とは、ある単語とこの単語を平易に読み上げられるだけの語学習得レベルとを対比させたデータベースである。音声化ゲートウェイサーバ１１０は、文書中に含まれる任意の単語について、その単語に関連づけられた語学習得レベルを取得可能である。 The level storage database 120 is a database in which a certain word is compared with a word learning acquisition level that allows the word to be read easily. The voice gateway server 110 can acquire a word learning level associated with an arbitrary word included in the document.

ＴＴＳサーバ１３０は、音声化ゲートウェイサーバ１１０からテキストを受信すると、このテキストを読み上げた音声データを音声合成によって生成し、これを音声化ゲートウェイサーバ１１０に送信する。 When the TTS server 130 receives the text from the voice gateway server 110, the TTS server 130 generates voice data obtained by reading the text by voice synthesis, and transmits the voice data to the voice gateway server 110.

以上説明した構成においては、音声化ゲートウェイサーバ１１０のみがインターネットに接続され、レベル格納データベース１２０及びＴＴＳサーバ１３０はこの音声化ゲートウェイサーバ１１０とＬＡＮ経由で接続されている。しかしながら本発明は上記構成に限定されるものではない。例えば、レベル格納データベース１２０及びＴＴＳサーバ１３０の何れか一方または双方がインターネットに接続されており、これらと音声化ゲートウェイサーバ１１０とがインターネットを介してデータのやり取りをおこなっても良い。また、本実施形態においては音声化ゲートウェイサーバ１１０、レベル格納データベース１２０、ＴＴＳサーバ１３０が夫々別々の装置として示されているが、単一のサーバ装置が音声化ゲートウェイサーバ１１０、レベル格納データベース１２０、ＴＴＳサーバ１３０としての機能を兼ね備える構成もまた、本発明の範囲内である。 In the configuration described above, only the voice gateway server 110 is connected to the Internet, and the level storage database 120 and the TTS server 130 are connected to the voice gateway server 110 via a LAN. However, the present invention is not limited to the above configuration. For example, one or both of the level storage database 120 and the TTS server 130 may be connected to the Internet, and the voice gateway server 110 may exchange data via the Internet. In the present embodiment, the voice gateway server 110, the level storage database 120, and the TTS server 130 are shown as separate devices, but the single server device is the voice gateway server 110, the level storage database 120, A configuration having a function as the TTS server 130 is also within the scope of the present invention.

以上説明した構成によって、文書の音声データを端末２００の使用者（語学習得者）が得るまでの手順に付き説明する。まず、使用者がウェブブラウザを操作して、音声化ゲートウェイサーバ１１０に接続する。接続が完了すると、図２のように文書ＵＲＬ入力用ページがブラウザの文書表示エリアに表示される。 With the configuration described above, a procedure until the user (word learner) of the terminal 200 obtains the voice data of the document will be described. First, the user operates the web browser to connect to the voice gateway server 110. When the connection is completed, a document URL input page is displayed in the document display area of the browser as shown in FIG.

この文書ＵＲＬ入力用ページには、テキスト行入力コントロール（ｓｉｎｇｌｅ−ｌｉｎｅｔｅｘｔｉｎｐｕｔｃｏｎｔｒｏｌ）Ｔ１、ボタンＢ１、ラジオボタンＲ１が表示される。テキスト行入力コントロールＴ１は、端末２００の使用者が、音声読み上げを希望する文書（プレーンテキスト、ＨＴＭＬ文書等）のＵＲＬを入力するための領域である。使用者は、端末２００のキーボードを操作して、このテキスト行入力コントロールＴ１に文字を入力することができる。 A text line input control (single-line text input control) T1, a button B1, and a radio button R1 are displayed on the document URL input page. The text line input control T1 is an area for the user of the terminal 200 to input a URL of a document (plain text, HTML document, etc.) desired to be read out by voice. The user can input characters to the text line input control T1 by operating the keyboard of the terminal 200.

ラジオボタンＲ１は、図２中に縦方向に４つ並べられており、この４つのラジオボタンＲ１の何れかを選択することによって、使用者の語学習得レベルを選択するものである。使用者は、端末２００のマウスを操作することによって、カーソルＣを所望のラジオボタンＲ１に移動させ、次いでマウスのボタンをクリックすることによって所望のラジオボタンを選択することができる。 Four radio buttons R1 are arranged in the vertical direction in FIG. 2, and the user's word learning level is selected by selecting any one of the four radio buttons R1. The user can select the desired radio button by operating the mouse of the terminal 200 to move the cursor C to the desired radio button R1, and then clicking the mouse button.

ボタンＢ１は、テキスト行入力コントロールＴ１及びラジオボタンＲ１の内容を音声化ゲートウェイサーバ１１０に送信するためのボタンである。使用者は、端末２００のマウスを操作することによって、カーソルＣをボタンに重ね、次いでマウスのボタンをクリックすることによってテキスト行入力コントロールＴ１及びラジオボタンＲ１の内容を送信する。 The button B1 is a button for transmitting the contents of the text line input control T1 and the radio button R1 to the voice gateway server 110. The user operates the mouse of the terminal 200 to superimpose the cursor C on the button, and then transmits the contents of the text line input control T1 and the radio button R1 by clicking the mouse button.

すなわち、使用者は、キーボード等を用いて所望の文書のＵＲＬをテキスト行入力コントロールＴ１に入力し、使用者自身の語学習得レベルに応じたラジオボタンＲ１を選択し、最後にボタンＢ１を操作して、所望の文書のＵＲＬ及び使用者自身の語学習得レベルをゲートウェイサーバ１１０に送信する。 That is, the user inputs the URL of a desired document into the text line input control T1 using a keyboard or the like, selects the radio button R1 corresponding to the user's own word learning level, and finally operates the button B1. Then, the URL of the desired document and the user's own word learning acquisition level are transmitted to the gateway server 110.

ゲートウェイサーバ１１０は、端末２００から文書のＵＲＬと使用者の語学習得レベルを受信すると、図３のフローチャートに示されるルーチンを実行する。このルーチンが開始すると、最初にステップＳ１が実行される。 When the gateway server 110 receives the URL of the document and the word learning level of the user from the terminal 200, the gateway server 110 executes the routine shown in the flowchart of FIG. When this routine starts, step S1 is executed first.

ステップＳ１では、ゲートウェイサーバ１１０は、端末２００から送信されたＵＲＬに対応する文書があるかどうかの確認をおこなう。このＵＲＬに対応する文書が無い、若しくはこのＵＲＬに対応する文書はあるがゲートウェイサーバ１１０が対応していないエンコーディングの文書であった、といった場合は（Ｓ１：ＮＯ）、ステップＳ１１を実行する。ステップＳ１１では、エラーメッセージを端末２００に送信する。一方、ステップＳ１において、端末２００から送信されたＵＲＬに対応する文書があり、且つその文書がゲートウェイサーバ１１０が対応していないエンコーディングで記述されていることが確認された場合は（Ｓ１：ＹＥＳ）、ステップＳ２に進む。 In step S <b> 1, the gateway server 110 checks whether there is a document corresponding to the URL transmitted from the terminal 200. If there is no document corresponding to this URL, or there is a document corresponding to this URL but the encoding is not supported by the gateway server 110 (S1: NO), step S11 is executed. In step S11, an error message is transmitted to terminal 200. On the other hand, if it is confirmed in step S1 that there is a document corresponding to the URL transmitted from the terminal 200 and that the document is described in an encoding not supported by the gateway server 110 (S1: YES). The process proceeds to step S2.

ステップＳ２では、ゲートウェイサーバ１１０は、ユーザエージェント機能を利用してＵＲＬに対応する外部のウェブサーバ３００から文書をダウンロードする。次いで、ステップＳ３に進む。 In step S2, the gateway server 110 downloads a document from the external web server 300 corresponding to the URL using the user agent function. Next, the process proceeds to step S3.

ステップＳ３では、文書の整形がおこなわれる。すなわち、文書がＨＴＭＬファイルである場合は、不要なタグやコメントやＳＧＭＬ宣言などを除去し、純粋な文書のみを抽出する。また、文書が整形済の（すなわち、所定の文字数ごとに強制的に改行コードが挿入されている）プレーンテキストである場合は、改行コードを除去する。さらに、文書の言語が日本語のように単語同士を区切る文字を持たないものであるならば、文書の形態解析を行って、文書を単語ごとに分割する。これらの文書の整形方法については既知であるため、詳細な説明は省略する。次いで、ステップＳ４に進む。 In step S3, the document is shaped. That is, when the document is an HTML file, unnecessary tags, comments, SGML declarations, etc. are removed, and only a pure document is extracted. If the document is a plain text that has been formatted (that is, a line feed code is forcibly inserted every predetermined number of characters), the line feed code is removed. Furthermore, if the language of the document does not have characters that separate words from each other like Japanese, the form of the document is analyzed to divide the document into words. Since the formatting method of these documents is known, detailed description is omitted. Next, the process proceeds to step S4.

ステップＳ４では、ステップＳ３で整形を行った文書に含まれる単語を文書の先頭から順に一つずつ抽出する。この際、日本語における助詞や助動詞、英語におけるｂｅ動詞や代名詞、助動詞など、一つの文書内に多く出現し、また語学学習者にとってもきわめて平易であることが既知である単語については抽出しない構成としてもよい。次いでステップＳ５に進む。 In step S4, words included in the document shaped in step S3 are extracted one by one from the top of the document. At this time, words that frequently appear in one document, such as Japanese verbs and auxiliary verbs, English be verbs, pronouns, auxiliary verbs, and the like that are known to be very easy for language learners are not extracted. It is good. Next, the process proceeds to step S5.

ステップＳ５では、ゲートウェイサーバ１１０は、ステップＳ４で抽出された単語について、レベル格納データベース１２０に問い合わせを行う。レベル格納データベース１２０では、図４のように単語の各々について、そのレベルを参照できるようになっている。言語が日本語であるならば、初等教育で習う単語や音読みのみで発音可能な単語については低いレベルが割り当てられ、漢字単体の音読みにも訓読みにも無い発音が割り当てられているような単語については高いレベルが割り当てられるようになっている。レベル格納データベース１２０は、ゲートウェイサーバ１１０より単語のテキストを受信すると、この単語のレベルをゲートウェイサーバ１１０に送り返す。かくして、ゲートウェイサーバ１１０は、その単語の難易度レベルを取得する。ゲートウェイサーバ１１０は、次いで、ステップＳ６（図３）を実行する。 In step S5, the gateway server 110 makes an inquiry to the level storage database 120 for the word extracted in step S4. In the level storage database 120, the level of each word can be referred to as shown in FIG. If the language is Japanese, words that are learned in primary education or words that can be pronounced by reading aloud are assigned a low level, and words that are assigned a pronunciation that is neither aloud nor kanji readings of kanji alone Is assigned a higher level. When the level storage database 120 receives the word text from the gateway server 110, the level storage database 120 sends the word level back to the gateway server 110. Thus, the gateway server 110 acquires the difficulty level of the word. The gateway server 110 then executes step S6 (FIG. 3).

ステップＳ６では、ゲートウェイサーバ１１０はステップＳ５で取得した単語の難易度レベルと、本ルーチンの開始時に端末２００から受け取っている使用者の言語習得レベルとを比較する。ここで、単語の難易度レベルが使用者の言語習得レベル以上であるなら（Ｓ６：ＹＥＳ）、この単語を読み上げた音声データが必要であると判断し、ステップＳ７に進む。一方、ステップＳ６において単語の難易度レベルが使用者の言語習得レベル未満であるなら（Ｓ６：ＮＯ）、この単語については音声データを用意する必要はないと判断し、ステップＳ４にもどり、文書内に次に現れる単語の抽出を行う。 In step S6, the gateway server 110 compares the word difficulty level acquired in step S5 with the language acquisition level of the user received from the terminal 200 at the start of this routine. Here, if the difficulty level of the word is equal to or higher than the language acquisition level of the user (S6: YES), it is determined that the voice data reading the word is necessary, and the process proceeds to step S7. On the other hand, if the difficulty level of the word is lower than the language acquisition level of the user in step S6 (S6: NO), it is determined that it is not necessary to prepare voice data for this word, and the process returns to step S4, Next word is extracted.

ステップＳ７では、ゲートウェイサーバ１１０はＴＴＳサーバ１３０に問い合わせを行う。具体的には、ゲートウェイサーバ１１０はステップＳ４で抽出した単語のテキストをＴＴＳサーバ１３０に送信する。ＴＴＳサーバ１３０は、音声合成によってこの単語を読み上げた音声データを作成し、これをゲートウェイサーバ１１０に送り返す。ゲートウェイサーバ１１０は、受信したデータをサーバのストレージ手段に保存する。なお、ストレージ手段の容量が使用済みの音声データによって圧迫されるのを防止するため、ステップＳ７の実行から一定時間後（例えば１時間後）に、音声データを消去する構成としてもよい。次いで、ステップＳ８に進む。 In step S7, the gateway server 110 makes an inquiry to the TTS server 130. Specifically, the gateway server 110 transmits the text of the word extracted in step S4 to the TTS server 130. The TTS server 130 creates speech data that reads out the word by speech synthesis and sends it back to the gateway server 110. The gateway server 110 stores the received data in the server storage means. In order to prevent the capacity of the storage unit from being compressed by the used voice data, the voice data may be deleted after a certain time (for example, one hour) after the execution of step S7. Next, the process proceeds to step S8.

ステップＳ８では、ステップＳ７で得た音声データへのリンクアンカー（ＨＴＭＬにおけるＡ要素）をステップＳ３で整形した文書に埋め込む。次いで、ステップＳ９に進む。 In step S8, the link anchor (A element in HTML) to the audio data obtained in step S7 is embedded in the document shaped in step S3. Next, the process proceeds to step S9.

ステップＳ９では、文書の最後まで単語の検索（ステップＳ４）が完了したかどうかの判定が行われる。文書の最後まで単語の検索が済んだのであれば（Ｓ９：ＹＥＳ）、ステップＳ１０に進む。一方、文書の最後まで単語の検索を終えていないのであれば（Ｓ９：ＮＯ）、まだ抽出すべき単語が残されている可能性があるということなので、ステップＳ４に戻り、他の単語の抽出を行う。 In step S9, it is determined whether the word search (step S4) has been completed up to the end of the document. If the word has been searched to the end of the document (S9: YES), the process proceeds to step S10. On the other hand, if the word search has not been completed until the end of the document (S9: NO), it means that there is a possibility that the word to be extracted still remains, so the process returns to step S4 to extract other words. I do.

ステップＳ１０では、ステップＳ３によって一旦整形され、ステップＳ８にてリンクアンカーが埋め込まれた文書に、ヘッダやタイトル要素など、ＨＴＭＬファイルとして必要なデータが追記され、ＨＴＭＬファイルが作成される。次いで、ゲートウェイサーバ１１０は、このＨＴＭＬファイルを端末２００に送信し、本ルーチンを終了させる。 In step S10, necessary data as an HTML file, such as a header and a title element, is added to the document once shaped in step S3 and embedded with the link anchor in step S8, thereby creating an HTML file. Next, the gateway server 110 transmits this HTML file to the terminal 200 and ends this routine.

端末２００は、このＨＴＭＬファイルを受信すると、ブラウザを制御してこのＨＴＭＬファイルをＷｅｂページとして表示する。ブラウザに表示されるＷｅｂページの一例を図５に示す。図５に示されるように、文書中には音声データへのリンクが設けられた単語が強調表現（本実施形態においては下線）で表示され、端末２００の使用者はこのリンクを操作する（例えばカーソルをこの単語に重ねてマウスのボタンをクリックする）ことによって、この単語を読み上げた音声データをダウンロードして聴くことができる。 When the terminal 200 receives the HTML file, the terminal 200 controls the browser to display the HTML file as a Web page. An example of the Web page displayed on the browser is shown in FIG. As shown in FIG. 5, a word provided with a link to audio data is displayed in an emphasized expression (underlined in this embodiment) in the document, and the user of the terminal 200 operates this link (for example, If you place the cursor on this word and click the mouse button), you can download and listen to the voice data that reads out this word.

以上のように、本実施形態によれば、文書中の単語を読み上げた音声データが文書中にリンクとして示されたＨＴＭＬ文書を端末２００の使用者は取得できるので、テキストである文書と、その文書中の単語を読み上げた音声とを相互参照しながら自然言語の学習を効率的に進めていくことが出来るようになる。さらに、言語習得者のレベルに応じて音声と関連づけられる単語は変化するので、言語習得者は自分のレベルにあったＨＴＭＬ文書を取得できる。 As described above, according to the present embodiment, since the user of the terminal 200 can acquire the HTML document in which the voice data read out from the word in the document is indicated as a link in the document, It becomes possible to efficiently advance learning of natural language while cross-referencing the speech that reads out the words in the document. Furthermore, since the word associated with the sound changes according to the level of the language learner, the language learner can acquire an HTML document suitable for his / her level.

なお、本実施形態においては、端末２００から所望の文書のＵＲＬを送信する構成としているが、代わりに、文書自身を直接音声化ゲートウェイサーバに送信する構成としてもよい。 In the present embodiment, the URL of the desired document is transmitted from the terminal 200, but instead, the document itself may be directly transmitted to the voice gateway server.

本発明の実施の形態による音声化システムの全体を示す概念図である。It is a conceptual diagram which shows the whole voice-izing system by embodiment of this invention. 本発明の実施の形態において、端末に表示される文書ＵＲＬ入力ページを示したものである。In the embodiment of the present invention, a document URL input page displayed on a terminal is shown. 本発明の実施の形態において、音声化ゲートウェイサーバによって実行されるプログラムのフローである。In embodiment of this invention, it is a flow of the program performed by the voice gateway server. 本発明の実施の形態において、レベル格納データベースに格納されたデータの一例を示したものである。In embodiment of this invention, an example of the data stored in the level storage database is shown. 本発明の実施の形態において、端末に表示されたＷｅｂページを示したものである。In the embodiment of the present invention, a web page displayed on a terminal is shown.

Explanation of symbols

１音声化システム
１００音声化サーバ
１１０音声化ゲートウェイサーバ
１２０レベル格納データベース
１３０ＴＴＳサーバ
２００端末
３００ウェブサーバ 1 voice system 100 voice server 110 voice gateway server 120 level storage database 130 TTS server 200 terminal 300 web server

Claims

An audio system including a terminal and an audio server connected to each other via a predetermined network,
The terminal is
Information input means for inputting text information about the text and user level information;
Communication means for transmitting the text information and the level information to the voice server;
Display means,
The voice server is
Text acquisition means for acquiring text corresponding to the text information;
Word extraction means for extracting from the text a word having a reading difficulty level equal to or higher than the level of the user based on the level information;
Voice data acquisition means for acquiring voice data of words extracted by the word extraction means;
HTML data creation means for creating HTML data in which link anchors to voice data are embedded in the text;
Data transmitting means for transmitting the HTML data created by the HTML data creating means to the terminal,
The display means displays HTML data received from the voice server.
An audio system characterized by that.

The voice server further includes a difficulty level database in which a word and a difficulty level of reading the word are stored in association with each other.
The word extraction means makes an inquiry to the difficulty level database for each word included in the text to obtain a reading difficulty level of each word, thereby obtaining a word having a reading difficulty level equal to or higher than a user's level. Extract from text,
The voice system according to claim 1.

The voice system according to claim 1, wherein the text information includes a URL of the text.

The speech system according to claim 1, wherein the text information includes the text itself.

4. The speech system according to claim 3, wherein the text is provided by a content providing server connected to the predetermined network.

6. The audio system according to claim 5, wherein the content providing server is a Web server.

7. The speech synthesis unit according to claim 1, further comprising: a speech synthesis unit that generates speech data obtained by reading out a word from the words extracted by the word extraction unit by speech synthesis. The described voice system.

The voice system according to claim 1, wherein the predetermined network is the Internet.

Receive text information about the text and user level information from the terminal,
Obtain text based on the received text information,
Based on the level information, from the words included in the text, a word having a difficulty level of reading that is higher than the level of the user is extracted,
Obtaining voice data reading the extracted words;
An HTML file is created by embedding a link anchor to the voice data in the text,
Sending the created HTML file to the terminal;
An audio method characterized by the above.

The speech method according to claim 9, wherein the text information includes a URL of the text.

The speech method according to claim 9 or 10, wherein the text information includes the text itself.

The method according to claim 10, wherein the text is provided by a content providing server connected to the predetermined network.

13. The audio conversion method according to claim 12, wherein the content providing server is a Web server.

The speech method according to any one of claims 9 to 13, wherein speech data obtained by reading out the extracted word is generated by speech synthesis.

Text receiving means for receiving text information about the text and user level information from the terminal;
Text acquisition means for acquiring text based on the received text information;
Based on the level information, a word extracting means for extracting a word having a difficulty level of reading equal to or higher than the level of the user from words included in the text;
Voice data acquisition means for acquiring voice data reading the extracted word;
HTML data creating means for creating HTML data by embedding a link anchor to the voice data in the text;
Data transmission means for transmitting the created HTML data to the terminal;
A voice server.

A text reception procedure for receiving text information about the text and user level information from the terminal;
A text acquisition procedure for acquiring text based on the received text information;
Based on the level information, a word extraction procedure for extracting a word having a difficulty level of reading equal to or higher than the level of the user from words included in the text;
A voice data acquisition procedure for acquiring voice data reading the extracted word;
HTML data creation procedure for creating HTML data by embedding a link anchor to the voice data in the text;
A data transmission procedure for transmitting the created HTML data to the terminal;
An audio program for running.