JP2017102939A

JP2017102939A - Authoring device, authoring method, and program

Info

Publication number: JP2017102939A
Application number: JP2016251382A
Authority: JP
Inventors: 植野　博; Hiroshi Ueno; 博植野
Original assignee: ProField Co Ltd
Current assignee: ProField Co Ltd
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2017-06-08

Abstract

PROBLEM TO BE SOLVED: To provide an authoring device capable of performing authoring processing to an object by using a voice.SOLUTION: The authoring device includes: an authoring data storage part 101 for storing authoring data in which one or more objects are arranged; a voice reception part 102 for receiving a voice; a voice recognition part 103 for performing voice recognition processing to the voice received by the voice reception part 102; an authoring processing part 104 for performing authoring processing to the object arranged in the authoring data stored in the authoring data storage part 101 in accordance with the result of the voice recognition processing of the voice recognition part 103; and an output part 105 for outputting the processing result of the authoring processing part.SELECTED DRAWING: Figure 1

Description

本発明は、オーサリング処理を行なうオーサリング装置等に関するものである。 The present invention relates to an authoring apparatus that performs an authoring process.

従来の技術として、コンピュータを使用した自動電子出版支援システムであり、電気通信回線を介してコンテンツ提供者よりコンテンツ情報を受信して記憶装置に保存するコンテンツ取得手段と、電気通信回線を介して広告主より広告情報を受信して記憶装置に保存する広告取得手段と、所定のレイアウト決定ルールに基いて自動的にコンテンツ情報と広告情報を配置して電子出版物を生成する自動レイアウト手段を有する、自動電子出版支援システムが知られていた（例えば、特許文献１参照）。 2. Description of the Related Art As a conventional technique, an automatic electronic publishing support system using a computer, content acquisition means for receiving content information from a content provider via a telecommunication line and storing it in a storage device, and advertisement via a telecommunication line An advertisement acquisition means for receiving advertisement information from the main and storing it in a storage device; and an automatic layout means for automatically arranging content information and advertisement information based on a predetermined layout determination rule to generate an electronic publication. An automatic electronic publishing support system has been known (for example, see Patent Document 1).

特開２０１２−２４２８６５号公報（第１頁、第１図等）JP 2012-242865 A (first page, FIG. 1 etc.)

しかしながら、従来のオーサリング装置においては、音声を用いてオブジェクトについて、オーサリング処理を行なうことができない、という課題があった。例えば、ページに配置されたオブジェクトに対して、音声により、サイズの変更や、位置の変更等のオーサリング処理を行なうことができなかった。このため、例えば、手がふさがっていて手による操作が困難な場合等に、容易にオーサリング処理を行なうことができなかった。また、手による操作に習熟していないユーザが、オーサリング処理を行なうことが困難であった。また、例えば、音声と手とによる入力の融合を図ることができない、という問題があった。 However, the conventional authoring apparatus has a problem that it cannot perform an authoring process on an object using sound. For example, an authoring process such as a size change or a position change cannot be performed by voice on an object arranged on a page. For this reason, the authoring process cannot be easily performed, for example, when the hand is full and the operation by the hand is difficult. In addition, it is difficult for a user who is not proficient in manual operation to perform the authoring process. In addition, for example, there is a problem that it is not possible to combine voice and hand input.

本発明は、上記のような課題を解消するためになされたものであり、音声を用いてオブジェクトについてオーサリング処理を行なうことができるオーサリング装置等を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide an authoring apparatus and the like that can perform an authoring process on an object using sound.

本発明のオーサリング装置は、１以上のオブジェクトが配置されたオーサリングデータが格納されるオーサリングデータ格納部と、音声を受け付ける音声受付部と、音声受付部が受け付けた音声について音声認識処理を行なう音声認識部と、音声認識部の音声認識処理の結果に応じて、オーサリングデータ格納部に格納されているオーサリングデータに配置されたオブジェクトについてオーサリング処理を行なうオーサリング処理部とオーサリング処理部の処理結果を出力する出力部とを備えたオーサリング装置である。 The authoring apparatus according to the present invention includes an authoring data storage unit that stores authoring data in which one or more objects are arranged, a voice reception unit that receives voice, and a voice recognition that performs voice recognition processing on the voice received by the voice reception unit. And the processing results of the authoring processing unit and the authoring processing unit that perform the authoring processing on the object arranged in the authoring data stored in the authoring data storage unit according to the result of the voice recognition processing of the voice recognition unit An authoring device including an output unit.

かかる構成により、音声を用いてオブジェクトについてオーサリング処理を行なうことができる。 With this configuration, it is possible to perform an authoring process for an object using sound.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリングデータ格納部に格納されている１以上のオーサリングデータを表示する表示部を更に備え、オーサリング処理部は、表示部が表示しているオーサリングデータに配置された１以上のオブジェクトを処理対象としてオーサリング処理を行なうオーサリング装置である。 The authoring apparatus according to the present invention further includes a display unit that displays one or more authoring data stored in the authoring data storage unit in the authoring device, and the authoring processing unit includes the authoring displayed by the display unit. An authoring apparatus that performs authoring processing on one or more objects arranged in data.

かかる構成により、表示されているオブジェクトについて、音声によりオーサリングを行なうことができる。 With this configuration, the displayed object can be authored by voice.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部は、表示部が表示画面内に全体を表示しているオブジェクトのみを処理対象としてオーサリング処理を行なうオーサリング装置である。 In the authoring apparatus according to the present invention, in the authoring apparatus, the authoring processing unit performs an authoring process on only an object whose display unit displays the entire display screen.

かかる構成により、表示画面内に全体が表示されているオブジェクトを処理対象として、音声によりオーサリングを行なうことができる。 With this configuration, it is possible to perform authoring by voice with the object that is entirely displayed in the display screen as a processing target.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリングデータ格納部に格納されている１以上のオーサリングデータを表示する表示部を更に備え、オーサリング処理部は、表示部が表示していないオブジェクトを処理対象としてオーサリング処理を行なうオーサリング装置である。 The authoring device of the present invention further includes a display unit that displays one or more authoring data stored in the authoring data storage unit in the authoring device, and the authoring processing unit is an object that is not displayed by the display unit. Is an authoring apparatus that performs an authoring process.

かかる構成により、表示されているオブジェクト以外のオブジェクトを処理対象として、音声を用いて処理を行なうことができる。 With this configuration, an object other than the displayed object can be processed and processed using sound.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部が取得するオーサリングデータは、オブジェクトが配置可能な１以上のレイヤを有しており、オーサリング処理部は、オーサリングデータの、音声認識処理により指定されたレイヤに対してオーサリング処理を行なうオーサリング装置である。 In the authoring apparatus of the present invention, in the authoring apparatus, the authoring data acquired by the authoring processing unit has one or more layers on which an object can be arranged, and the authoring processing unit recognizes speech of the authoring data. An authoring apparatus that performs an authoring process for a layer designated by the process.

かかる構成により、処理対象のオブジェクトを、レイヤ単位で特定することができる。 With this configuration, the object to be processed can be specified in units of layers.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部は、音声認識結果の少なくとも一部から、処理対象のオブジェクトを特定し、音声認識結果の少なくとも一部から、オブジェクトに対する処理を特定するオーサリング装置である。 In the authoring device of the present invention, in the authoring device, the authoring processing unit identifies an object to be processed from at least a part of the speech recognition result, and identifies a process for the object from at least a part of the speech recognition result. Authoring device.

かかる構成により、音声認識結果から、処理対象のオブジェクトと処理とを特定することで、指定した処理対象に対して、指定した処理を行なうことができる。 With such a configuration, the specified process can be performed on the specified processing target by specifying the processing target object and the process from the voice recognition result.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部は、音声認識結果の少なくとも一部から、被処理対象を特定し、被処理対象に対して１以上のオブジェクトを用いてオーサリング処理を行なうオーサリング装置である。 In the authoring device according to the present invention, in the authoring device, the authoring processing unit specifies a processing target from at least a part of the speech recognition result, and uses one or more objects for the processing target. Is an authoring device for performing

かかる構成により、ページやレイヤ等の被処理対象について処理対象のオブジェクトを用いた処理を行なうことができる。 With this configuration, it is possible to perform processing using a processing target object for a processing target such as a page or a layer.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリングデータ格納部に格納されているオーサリングデータを表示する表示部を更に備え、オーサリング処理部は、音声認識結果の少なくとも一部から、処理対象のオブジェクトを特定できた場合に、オブジェクトに対してオーサリング処理を行ない、処理対象のオブジェクトを特定できない場合に、表示部が表示しているオブジェクトを処理対象としてオーサリング処理を行なうオーサリング装置である。 The authoring device of the present invention further includes a display unit for displaying the authoring data stored in the authoring data storage unit in the authoring device, and the authoring processing unit is a processing target from at least a part of the speech recognition result. When the object is specified, the authoring process is performed on the object. When the object to be processed cannot be specified, the authoring apparatus performs the authoring process using the object displayed on the display unit as the process target.

かかる構成により、処理対象が特定できない場合に、表示しているオブジェクトを処理対象とすることで、適切な処理対象に対して処理を行なうことができる。 With such a configuration, when a processing target cannot be specified, it is possible to perform processing on an appropriate processing target by setting the displayed object as the processing target.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部は、音声認識結果の少なくとも一部から、処理対象のオブジェクトを特定できた場合に、オブジェクトに対してオーサリング処理を行ない、処理対象のオブジェクトを特定できない場合に、オーサリングデータが有する全てのオブジェクトを処理対象としてオーサリング処理を行なうオーサリング装置である。 Further, the authoring device of the present invention is the authoring device, wherein the authoring processing unit performs the authoring processing on the object when the object to be processed can be identified from at least a part of the speech recognition result, and the processing target When the object cannot be specified, the authoring apparatus performs the authoring process on all the objects included in the authoring data as the processing target.

かかる構成により、処理対象が特定できない場合に、表示されていない全てのオブジェクトを処理対象とすることで、適切な表示対象に対して処理を行なうことができる。 With this configuration, when the processing target cannot be specified, it is possible to perform processing on an appropriate display target by setting all objects that are not displayed as processing targets.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部が行なうオーサリング処理は、オブジェクトの属性を変更する処理であるオーサリング装置である。 The authoring apparatus according to the present invention is an authoring apparatus in which the authoring process performed by the authoring processing unit is a process of changing an attribute of an object in the authoring apparatus.

かかる構成により、音声によってオブジェクトの属性を変更することができる。 With this configuration, the attribute of the object can be changed by sound.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部が行なうオーサリング処理は、オブジェクトの位置を変更する処理であるオーサリング装置である。 The authoring apparatus according to the present invention is an authoring apparatus in which the authoring process performed by the authoring processing unit is a process of changing the position of an object in the authoring apparatus.

かかる構成により、音声によってオブジェクトの位置を変更することができる。 With this configuration, the position of the object can be changed by sound.

また、本発明のオーサリング装置は、前記オーサリング装置において、オーサリング処理部が行なうオーサリング処理は、オブジェクトの色を変更する処理であるオーサリング装置である。 The authoring apparatus of the present invention is an authoring apparatus in which the authoring process performed by the authoring processing unit is a process of changing the color of an object in the authoring apparatus.

かかる構成により、音声によってオブジェクトの色を変更することができる。 With this configuration, the color of the object can be changed by sound.

また、本発明のオーサリング装置は、前記オーサリング装置において、オブジェクトは、表示用のデータである表示データと、音声のデータである音声データとを一体化して有するオブジェクトであるオーサリング装置である。 Further, the authoring device of the present invention is the authoring device in the authoring device, wherein the object is an object integrally including display data that is display data and audio data that is audio data.

かかる構成により、オブジェクトが有する音声データを用いて、音声により処理対象のオブジェクトを指定することができる。 With this configuration, it is possible to specify an object to be processed by voice using voice data that the object has.

本発明によるオーサリング装置等によれば、音声を用いてオブジェクトについてオーサリング処理を行なうことができる。 According to the authoring apparatus and the like according to the present invention, it is possible to perform an authoring process for an object using sound.

本発明の実施の形態におけるオーサリング装置のブロック図Block diagram of an authoring apparatus in an embodiment of the present invention 同オーサリング装置の動作について説明するフローチャートFlow chart for explaining the operation of the authoring apparatus 同オーサリング装置の一例を示す図Diagram showing an example of the authoring device 同オーサリング装置のオーサリングデータ管理表を示す図Figure showing the authoring data management table of the authoring device 同オーサリング装置のオーサリングデータのページ構成を示す模式図Schematic diagram showing the authoring data page structure of the authoring device 同オーサリング装置による表示例を示す図Figure showing a display example by the authoring device 同オーサリング装置の認識処理対象管理表を示す図The figure which shows the recognition processing object management table of the same authoring device 同オーサリング装置の認識処理管理表を示す図The figure which shows the recognition processing management table of the same authoring device 同オーサリング装置の被処理対象管理表を示す図Figure showing the process target management table of the authoring device 同オーサリング装置の表示例を示す図Figure showing a display example of the authoring device 同オーサリング装置のオーサリングデータのページ構成を示す模式図Schematic diagram showing the authoring data page structure of the authoring device 同オーサリング装置の表示例を示す図Figure showing a display example of the authoring device 同オーサリング装置のオーサリングデータのページのレイヤを示す模式図Schematic diagram showing the authoring data page layer of the authoring device 同オーサリング装置のオーサリングデータのページ構成を示す模式図Schematic diagram showing the authoring data page structure of the authoring device 本発明の実施の形態におけるコンピュータシステムの外観の一例を示す図The figure which shows an example of the external appearance of the computer system in embodiment of this invention 同コンピュータシステムの構成の一例を示す図The figure which shows an example of a structure of the computer system

以下、オーサリング装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。 Hereinafter, embodiments of an authoring apparatus and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.

（実施の形態）
図１は、本実施の形態におけるオーサリング装置１のブロック図である。 (Embodiment)
FIG. 1 is a block diagram of an authoring device 1 according to the present embodiment.

オーサリング装置１は、オーサリングデータ格納部１０１、音声受付部１０２、音声認識部１０３、オーサリング処理部１０４、出力部１０５、表示部１０６を備える。 The authoring apparatus 1 includes an authoring data storage unit 101, a voice reception unit 102, a voice recognition unit 103, an authoring processing unit 104, an output unit 105, and a display unit 106.

オーサリング装置１は、例えば、コンピュータや、携帯情報端末、携帯電話、いわゆるスマートフォン等の多機能携帯電話、タブレット端末等により実現可能である。 The authoring device 1 can be realized by, for example, a computer, a portable information terminal, a mobile phone, a multifunctional mobile phone such as a so-called smartphone, a tablet terminal, or the like.

オーサリングデータ格納部１０１には、一以上のオーサリングデータが格納される。オーサリングデータは、１または２以上のオブジェクトが配置されたデータである。オーサリングデータは、例えば、オーサリング処理により取得されたデータである。オーサリング処理については後述する。オーサリングデータは、例えば、デジタルコンテンツやマルチメディアデータ等の最終生成物であってもよく、これらの最終生成物を生成するために用いられる中間生成物であっても良い。オーサリングデータが中間生成物である場合、例えば、オーサリングデータを、レンダリングしたり、予め指定された形式のデータとして書き出すことで、最終生成物が取得できる。 The authoring data storage unit 101 stores one or more authoring data. Authoring data is data in which one or more objects are arranged. The authoring data is, for example, data acquired by the authoring process. The authoring process will be described later. The authoring data may be, for example, a final product such as digital content or multimedia data, or may be an intermediate product used to generate these final products. When the authoring data is an intermediate product, for example, the final product can be acquired by rendering the authoring data or writing out the data as a data in a format specified in advance.

一のオーサリングデータは、例えば、一または２以上のページを有していても良い。ここでのページは、例えば、シートや、コンテンツ等を配置するための台紙等を含む概念である。ページは、例えば、１以上のオブジェクトが配置される面と考えてもよい。各ページには、例えば、ページ番号等のページの識別子が対応づけられている。また、一のページは、１または２以上のレイヤを有していても良い。各レイヤには、例えば、レイヤ名や、レイヤ番号等のレイヤの識別子が対応づけられている。ページや、レイヤについては、オーサリングの技術分野においては、公知技術であるため、ここでは、詳細な説明は省略する。なお、一のオーサリングデータは、ページを有していなくても良い。一のオーサリングデータは、例えば、一のファイルを構成するオーサリングデータである。 One piece of authoring data may have one or more pages, for example. The page here is a concept including, for example, a sheet, a mount for arranging contents, and the like. A page may be considered as a surface on which one or more objects are arranged, for example. For example, a page identifier such as a page number is associated with each page. One page may have one or more layers. Each layer is associated with a layer identifier such as a layer name or a layer number, for example. Since pages and layers are well-known techniques in the technical field of authoring, detailed description thereof is omitted here. Note that one authoring data may not have a page. One authoring data is, for example, authoring data constituting one file.

オーサリングデータに配置される１以上のオブジェクトとは、例えば、オーサリングの対象となるデータである。オブジェクトは、例えば、オーサリング対象の単位となるデータである。例えば、一のオブジェクトは、一のオーサリング対象として扱われるデータである。オブジェクトは、例えば、コンテンツのデータである。オブジェクトは、例えば、文字列や、映像、地図などを電子化したデータや、画像や文字列等を表示するソフトウェアや、これらを組み合わせたものである。オブジェクトは、例えば表示用のデータを有するデータである。表示用のデータとは、例えば、視覚化可能なデータである。オブジェクトは、例えば、テキストデータや、画像データである。テキストデータは、例えば、文字を示す一以上の文字コードを有するデータである。画像データは、静止画像であっても動画像であっても良い。静止画像は、ビットマップデータであっても、ベクタデータであっても良い。また、オブジェクトは、二次元や三次元のモデリングデータ等であっても良い。なお、予めページ等に固定されているページ番号等の文字列や画像や、テンプレート等として固定されている文字列や画像等については、ユーザが自由にオーサリングできルものではないと考えて、オーサリング対象であるオブジェクトから除外するようにしても良い。 One or more objects arranged in the authoring data are, for example, data to be authored. The object is, for example, data that is a unit to be authored. For example, one object is data handled as one authoring target. The object is, for example, content data. The object is, for example, a character string, data obtained by digitizing a video, a map, software for displaying an image or a character string, or a combination of these. The object is data having display data, for example. The display data is, for example, data that can be visualized. The object is, for example, text data or image data. The text data is, for example, data having one or more character codes indicating characters. The image data may be a still image or a moving image. The still image may be bitmap data or vector data. The object may be two-dimensional or three-dimensional modeling data. It should be noted that character strings and images such as page numbers that are fixed in advance to pages, etc., and character strings and images that are fixed as templates, etc. are considered to be not authored freely by the user. You may make it exclude from the object which is object.

オブジェクトは、例えば、表示用のデータと音声データとを一体化して有するデータであっても良い。表示用のデータと音声データとを一体化して有するデータをここでは音声付データと呼ぶ。例えば、音声付データは、表示用のデータと音声データとを一体化して有するファイルであっても良い。データ音声データとは、音声のデータである。音声データとは、例えば、音声の波形を示すデータである。例えば、音声データは、音声の波形を示す電圧の変化を示すデータである。音声データは、音声の波形を標本化したデータであってもよい。また、音声データは、非圧縮のデータであっても良く、圧縮したデータであっても良い。音声付データは、この音声付データに格納される音声データの特徴量を更に有していても良い。音声データの特徴量については後述する。また、音声付データは、この音声付データに格納される音声データを音素に分解した情報や、音素よりも更に細かい要素（以下、音素片と称す）に分解した情報や、これらを符号化した情報等を有していても良い。 The object may be, for example, data having display data and audio data integrated. Here, the data having the display data and the audio data integrated is referred to as data with audio. For example, the data with sound may be a file having display data and sound data integrated. Data audio data is audio data. The voice data is data indicating a voice waveform, for example. For example, the voice data is data indicating a change in voltage indicating a voice waveform. The voice data may be data obtained by sampling a voice waveform. The audio data may be uncompressed data or compressed data. The data with sound may further have a feature amount of the sound data stored in the data with sound. The feature amount of the audio data will be described later. In addition, the voice-attached data is information obtained by breaking down the voice data stored in the voice-added data into phonemes, information further broken down into elements (hereinafter referred to as phonemes) than the phonemes, and encoded these. You may have information etc.

オーサリングデータに配置されるオブジェクトは、オーサリングデータにリンクにより配置されたオブジェクトであっても良く、オーサリングデータに格納された（例えば、埋め込まれた）データであっても良い。リンクにより配置されるオブジェクトは、例えば、図示しないオブジェクト格納部等に格納されたデータである。 The object arranged in the authoring data may be an object arranged by a link to the authoring data, or may be data (for example, embedded) stored in the authoring data. The object arranged by the link is, for example, data stored in an object storage unit (not shown).

オーライングデータ格納部１０１には、例えば、オーサリング処理部１０４が、オーサリング処理を行なって取得したオーサリングデータが格納される。オーサリングデータ格納部１０１には、例えば、後述する出力部１０５が出力するオーサリングデータが格納される。また、外部から受信したオーサリングデータや、図示しない格納部等から読出されたオーサリングデータが格納されてもよい。 In the authoring data storage unit 101, for example, authoring data acquired by the authoring processing unit 104 by performing the authoring process is stored. The authoring data storage unit 101 stores, for example, authoring data output from the output unit 105 described later. Further, authoring data received from the outside or authoring data read from a storage unit (not shown) may be stored.

ここでの格納は、オーサリングデータを作成する際の、一時記憶等も含む概念である。例えば、オーサリングデータを作成する際に後述するオーサリング処理部１０４が取得したオーサリングデータの一時ファイル等がオーサリングデータ格納部１０１に格納され、この一時ファイルが、オーサリング処理部１０４により、適宜更新されても良い。ここでのオーサリング処理部１０４による取得は、オーサリング処理部１０４による生成であっても良く、オーサリング処理部１０４による図示しない格納部に格納されているオーサリングデータの読出しであってもよい。 The storage here is a concept including temporary storage when creating authoring data. For example, a temporary file of authoring data acquired by an authoring processing unit 104 (to be described later) when authoring data is created is stored in the authoring data storage unit 101, and this temporary file may be updated as appropriate by the authoring processing unit 104. good. The acquisition by the authoring processing unit 104 here may be generation by the authoring processing unit 104, or reading of authoring data stored in a storage unit (not shown) by the authoring processing unit 104.

オーサリングデータ格納部１０１は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。かかることは、他の格納部についても同様である。 The authoring data storage unit 101 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. The same applies to other storage units.

音声受付部１０２は、音声を受け付ける。音声受付部１０２は、例えば、マイクロフォン（図示せず）等を介して入力された音声を受け付ける。音声受付部１０２が受け付ける音声は、例えば音声信号である。音声受付部１０２は、例えば、受け付けた音声を示す音声データを取得する。例えば、音声受付部１０２は、受け付けた音声を標本化して音声データを取得する。また、音声受付部１０２が受け付ける音声は、音声データと考えてもよい。例えば、音声受付部１０２は、他の装置や、オーサリング処理装置内の他の構成等から送信あるいは出力される音声データを音声として受信してもよい。 The voice reception unit 102 receives voice. The voice receiving unit 102 receives voice input via a microphone (not shown) or the like, for example. The voice received by the voice receiving unit 102 is, for example, a voice signal. For example, the voice receiving unit 102 acquires voice data indicating the received voice. For example, the voice reception unit 102 obtains voice data by sampling the received voice. The voice received by the voice receiving unit 102 may be considered as voice data. For example, the voice reception unit 102 may receive voice data transmitted or output from another device, another configuration in the authoring processing device, or the like as a voice.

音声受付部１０２は、音声を受け付けるためのマイクロフォン等の入力手段を備えていてもよく、備えていなくても良い。音声受付部１０２は、入力手段のデバイスドライバや、メニュー画面の制御ソフトウェア等で実現され得る。 The voice receiving unit 102 may or may not include an input unit such as a microphone for receiving voice. The voice reception unit 102 can be realized by a device driver of input means, control software for a menu screen, or the like.

音声認識部１０３は、音声受付部１０２が受け付けた音声について音声認識処理を行なう。音声認識部１０３は、例えば、音声認識処理を行なって、オーサリング処理の処理対象を示す情報や、実行するオーサリング処理（以下、実行する処理と称す）を示す情報等を有する情報を取得する。例えば、音声認識部１０３は処理対象を示す情報や、実行する処理を示す情報を取得してもよく、これらを含むテキストデータ等のデータを取得してもよい。 The voice recognition unit 103 performs voice recognition processing on the voice received by the voice reception unit 102. For example, the voice recognition unit 103 performs voice recognition processing, and acquires information including information indicating a processing target of authoring processing, information indicating authoring processing to be executed (hereinafter referred to as processing to be executed), and the like. For example, the voice recognition unit 103 may acquire information indicating a processing target, information indicating a process to be executed, and may acquire data such as text data including these.

処理対象とは、例えば、オーサリング処理の対象となるものであり、例えば、オーサリンデータに配置されたオブジェクト等である。処理対象を示す情報は、例えば、処理対象の識別子や、処理対象の属性を示す情報である。処理対象の識別子は、例えば、処理対象であるオブジェクトのファイル名や、処理対象に設定された名称（例えば、オブジェクト名や、レイヤ名）や、処理対象に割り当てられた文字列等で構成されるコード（例えば、処理対象であるオブジェクトのＩＤコード）等である。処理対象がテキストオブジェクトである場合、このオブジェクトの少なくとも一部のテキストであっても良い。処理対象の属性を示す情報は、例えば、処理対象となるオブジェクトの色や、サイズ、オブジェクトのデータタイプ（例えば、画像やテキスト）等を示す情報である。例えば、これらの属性の範囲を示す情報であっても良い。また、処理対象の属性を示す情報は、例えば、処理対象のページ内の位置を示す情報（例えば、座標等）であっても良い。また、一のページやレイヤを示す情報を、これらに配置された１以上のオブジェクトを処理対象として示す情報と考えてもよい。同様に、ページ等に配置されるフレームやコンテナと呼ばれるオブジェクトの配置領域を示す情報を、この配置領域に配置されたオブジェクトを示す情報と考えてもよい。また、配置されているオブジェクトが音声付オブジェクトを含む場合、この音声付オブジェクトに含まれる音声データの少なくとも一部や、この音声データの少なくとも一部に対応する特徴量や音素や音素片等を、処理対象を示す情報と考えてもよい。 The processing target is, for example, a subject of authoring processing, for example, an object or the like arranged in the authoring data. The information indicating the processing target is, for example, information indicating a processing target identifier or a processing target attribute. The processing target identifier includes, for example, a file name of an object to be processed, a name set for the processing target (for example, an object name or a layer name), a character string assigned to the processing target, and the like. Code (for example, an ID code of an object to be processed). When the processing target is a text object, it may be at least part of the text of this object. The information indicating the processing target attribute is, for example, information indicating the color and size of the object to be processed, the data type of the object (for example, image or text), and the like. For example, information indicating the range of these attributes may be used. Further, the information indicating the processing target attribute may be, for example, information (for example, coordinates) indicating the position in the processing target page. Further, information indicating one page or layer may be considered as information indicating one or more objects arranged on these as processing targets. Similarly, information indicating an arrangement area of an object called a frame or container arranged on a page or the like may be considered as information indicating an object arranged in the arrangement area. Further, when the arranged object includes an object with sound, at least a part of the sound data included in the object with sound, a feature amount corresponding to at least a part of the sound data, a phoneme or a phoneme piece, It may be considered as information indicating the processing target.

実行する処理とは、例えば、処理対象に対して実行する１以上のオーサリング処理である。実行する処理を示す情報は、例えば、実行する処理を特定可能な情報であり、実行する処理を示す指示や、実行する処理の名称である。例えば実行する処理を示す情報は、オブジェクト等を移動させる指示や、オブジェクト等を削除させる指示や、画像オブジェクトの色を変更する指示等であってもよく、これらの指示を自然言語で表した「移動」や「削除」や「色を明るく」等の文字列であってもよい。実行する処理を示す情報は、実行する処理のパラメータ等を更に有する情報であっても良い。例えば、パラメータは、移動距離や、移動方向を示す情報である。例えば、パラメータは、左に５ピクセル、右に１０ピクセル等の、移動方向と移動量とを示す情報である。また、パラメータは、画像オブジェクトが示す画像の明るさを変更する程度を示す情報であっても良い。 The process to be executed is, for example, one or more authoring processes to be executed on the processing target. The information indicating the process to be executed is, for example, information that can identify the process to be executed, and is an instruction indicating the process to be executed and the name of the process to be executed. For example, the information indicating the process to be executed may be an instruction to move an object, an instruction to delete the object, an instruction to change the color of the image object, and the like. It may be a character string such as “move”, “delete”, or “brighten color”. The information indicating the process to be executed may be information further including parameters of the process to be executed. For example, the parameter is information indicating a movement distance or a movement direction. For example, the parameter is information indicating the moving direction and moving amount, such as 5 pixels on the left and 10 pixels on the right. The parameter may be information indicating the degree to which the brightness of the image indicated by the image object is changed.

音声認識部１０３が、音声受付部１０２が受け付けた音声についてどのような音声認識処理を行なうかは問わない。例えば、音声認識部１０３は、音声受付部１０２が受け付けた音声に対して音声認識を行なって、音声に対応したテキストデータを音声認識結果として取得する。このテキストデータは、例えば、音声受付部１０２が受け付けた音声に対応する音声データを、テキストデータ化したものである。この音声認識により取得されるテキストデータは、処理対象を示す情報や、実行する処理を示す情報を含むテキストデータである。後述する具体例においては、このように、音声認識結果が、音声に対応したテキストデータである場合を例に挙げて説明する。なお、音声から、音声認識により音声に対応するテキストデータを取得する処理は、公知技術であるため、ここでは詳細な説明は省略する。 It does not matter what kind of voice recognition processing the voice recognition unit 103 performs on the voice received by the voice reception unit 102. For example, the voice recognition unit 103 performs voice recognition on the voice received by the voice reception unit 102 and acquires text data corresponding to the voice as a voice recognition result. The text data is, for example, text data of voice data corresponding to the voice received by the voice receiving unit 102. The text data acquired by the speech recognition is text data including information indicating a processing target and information indicating a process to be executed. In a specific example to be described later, a case where the speech recognition result is text data corresponding to speech will be described as an example. In addition, since the process which acquires the text data corresponding to an audio | voice by audio | voice recognition is a well-known technique, detailed description is abbreviate | omitted here.

また、音声認識部１０３は、更に、上記のように音声認識処理により取得したテキストデータの中に、処理対象を示す文字列や実行する処理等を示す文字列等である認識用文字列と一致する１以上の文字列が含まれているか否かを判断し、認識用文字列と一致する１以上の文字列が含まれる場合に、この１以上の認識用文字列にそれぞれ対応する処理対象を示す情報や、実行する処理を示す情報を取得するようにしてもよい。この場合、予め指定された閾値以上の数あるいは比率が一致する文字列が含まれている場合、テキストデータ内の一の文字列と、一の認識用文字列とが一致していると判断するようにしてもよい。 Further, the voice recognition unit 103 further matches the recognition character string that is the character string indicating the processing target, the character string indicating the processing to be executed, etc. in the text data acquired by the voice recognition processing as described above. If one or more character strings that match the recognition character string are included, a processing target corresponding to each of the one or more recognition character strings is determined. You may make it acquire the information which shows or the information which shows the process to perform. In this case, if a character string having a number or ratio equal to or greater than a predetermined threshold value is included, it is determined that one character string in the text data matches one recognition character string. You may do it.

認識用文字列は、例えば、処理対象を表す認識用文字列と、実行する処理を表す認識用文字列とを有する。処理対象を表す認識用文字列は、例えば、処理対象を表す文字列であり、例えば、処理対象を自然言語等で表した文字列である。また、実行する処理を表す認識用文字列は、実行する処理を表した文字列であり、例えば実行する処理を自然言語等で表した文字列である。処理対象を表した認識用文字列は、例えば、処理対象を示す情報と対応づけられている。また、実行する処理を表した認識用文字列は、例えば、実行する処理を示す情報と対応づけられている。実行する処理を示す情報は、例えば、この処理を実行させるための指示や指示名である。ここでの指示は、コマンドも含むと考えてもよい。認識用文字列は、例えば、図示しない格納部等に予め格納しておくようにしてよい。 The recognition character string includes, for example, a recognition character string representing a processing target and a recognition character string representing a process to be executed. The recognition character string that represents the processing target is, for example, a character string that represents the processing target, for example, a character string that represents the processing target in a natural language or the like. Further, the recognition character string representing the process to be executed is a character string representing the process to be executed, for example, a character string representing the process to be executed in a natural language or the like. The recognition character string representing the processing target is associated with information indicating the processing target, for example. Further, the recognition character string representing the process to be executed is associated with information indicating the process to be executed, for example. The information indicating the process to be executed is, for example, an instruction or an instruction name for executing this process. The instruction here may be considered to include a command. The recognition character string may be stored in advance in a storage unit (not shown), for example.

例えば、処理対象を示す情報が、処理対象を表した認識用文字列と同じである場合、音声認識部１０３は、一の処理対象を表した認識用文字列と一致する文字列が、音声認識処理により取得したテキストデータの中に検出された場合、この一致すると判断された処理対象を表した認識用文字列に対応する処理対象を示す情報として、この処理対象を表した認識用文字列自身を取得すればよい。また、例えば、処理対象を示す情報が、処理対象を表した認識用文字列と同じでない場合、音声認識部１０３は、一の処理対象を表した認識用文字列と一致する文字列が、音声認識処理により取得したテキストデータの中に検出された場合、この一致すると判断された処理対象を表した認識用文字列と対応づけて図示しない格納部等に格納されている処理対象を示す情報を、上記の処理対象を示す情報として取得すればよい For example, when the information indicating the processing target is the same as the recognition character string representing the processing target, the voice recognition unit 103 determines that the character string that matches the recognition character string representing one processing target is the voice recognition When it is detected in the text data obtained by processing, the recognition character string itself representing this processing target is used as information indicating the processing target corresponding to the recognition character string representing the processing target determined to match. Just get it. For example, when the information indicating the processing target is not the same as the recognition character string representing the processing target, the speech recognition unit 103 determines that the character string that matches the recognition character string representing the one processing target is a voice When detected in the text data acquired by the recognition processing, information indicating the processing target stored in a storage unit (not shown) in association with the recognition character string representing the processing target determined to match And can be acquired as information indicating the processing target.

同様に、例えば、実行する処理を示す情報が、実行する処理を表した認識用文字列と同じである場合、音声認識部１０３は、一の実行する処理を表した認識用文字列と一致する文字列が、音声認識処理により取得したテキストデータの中に検出された場合、この一致すると判断された実行する処理を表した認識用文字列に対応する処理を示す情報として、この実行する処理を表した認識用文字列自身を取得すればよい。また、例えば、実行する処理を示す情報が、実行する処理を表した認識用文字列と同じでない場合、音声認識部１０３は、一の実行する処理を表した認識用文字列と一致する文字列が、音声認識処理により取得したテキストデータの中に検出された場合、この一致すると判断された実行する処理を表した認識用文字列と対応づけて図示しない格納部等に格納されている実行する処理を示す情報を、上記の実行する処理を示す情報として取得すればよい。 Similarly, for example, when the information indicating the process to be executed is the same as the recognition character string representing the process to be executed, the speech recognition unit 103 matches the recognition character string representing the one process to be executed. When a character string is detected in the text data acquired by the speech recognition process, the process to be executed is used as information indicating the process corresponding to the recognition character string representing the process to be executed that is determined to match. What is necessary is just to acquire the recognition character string itself represented. For example, when the information indicating the process to be executed is not the same as the recognition character string representing the process to be executed, the voice recognition unit 103 matches the character string that matches the recognition character string representing the one process to be executed. Is detected in the text data acquired by the speech recognition process, the execution is stored in a storage unit (not shown) in association with the recognition character string representing the process to be executed that is determined to match. What is necessary is just to acquire the information which shows a process as information which shows the said process to perform.

また、音声認識部１０３は、音声受付部１０２が音声受付部１０２が受け付けた音声からテキストデータを上記のように取得せずに、音声受付部１０２が受け付けた音声の特徴量を取得し、この特徴量と、処理対象を示す音声や実行する処理を示す音声等である認識用音声の特徴量とを照合し、１以上の認識用音声の特徴量との適合度が閾値以上である特徴量が、音声受付部１０２が受け付けた音声の特徴量の中に検出された場合に、この１以上の認識用音声にそれぞれ対応する処理対象を示す情報や、実行する処理を示す情報を取得するようにしてもよい。 In addition, the voice recognition unit 103 acquires the feature amount of the voice received by the voice reception unit 102 without acquiring the text data from the voice received by the voice reception unit 102 as described above. The feature amount is matched with the feature amount of the recognition speech that is the speech indicating the processing target or the speech indicating the processing to be executed, and the matching amount of the feature amount with the one or more recognition speech feature amounts is equal to or greater than the threshold. Is detected in the feature amount of the voice received by the voice reception unit 102, information indicating a processing target corresponding to each of the one or more recognition voices and information indicating a process to be executed are acquired. It may be.

認識用音声は、例えば、処理対象を表す認識用音声と、実行する処理を表す認識用音声とを有する。処理対象を表す認識用音声は、例えば、処理対象を表す音声であり、例えば、処理対象を自然言語等で表した音声である。また、実行する処理を表す認識用音声は、実行する処理を表した音声であり、例えば実行する処理を自然言語等で表した音声である。処理対象を表した認識用音声は、例えば、処理対象と対応づけられている。また、実行する処理を表した認識用音声は、例えば、実行する処理と対応づけられている。 The recognition voice includes, for example, a recognition voice that represents a processing target and a recognition voice that represents a process to be executed. The recognition voice that represents the processing target is, for example, a voice that represents the processing target, for example, a voice that represents the processing target in a natural language or the like. Further, the recognition voice representing the process to be executed is a voice representing the process to be executed, for example, a voice representing the process to be executed in a natural language or the like. The recognition voice representing the processing target is associated with the processing target, for example. Also, the recognition voice representing the process to be executed is associated with the process to be executed, for example.

例えば、処理対象を示す認識用音声は、処理対象の識別子等を自然言語で読み上げた場合に得られる音声である。例えば、認識用音声は、処理対象となるオブジェクトに割り当てられた名称を、自然言語で読み上げて得られる音声である。このような場合、処理対象を表した認識用音声を取得するために読み上げられた処理対象の識別子等が示す処理対象を、認識用音声に対応する処理対象とし、この処理対象の識別子等を、上記で取得される処理対象を示す情報としてもよい。 For example, the recognition voice indicating the processing target is a voice obtained when the processing target identifier or the like is read out in a natural language. For example, the recognition sound is a sound obtained by reading a name assigned to an object to be processed in a natural language. In such a case, the processing target indicated by the processing target identifier read out in order to acquire the recognition voice representing the processing target is set as the processing target corresponding to the recognition voice, and the processing target identifier is It is good also as information which shows the processing target acquired above.

また、実行する処理を示す認識用音声は、実行する処理を自然言語で表したテキストを読み上げた音声である。例えば、認識用音声は、「移動」というテキストを読み上げた音声や、「大きく」というテキストを読み上げた音声であり、「移動」を読み上げた認識用音声は、移動を実行する処理と対応づけられており、「大きく」を読み上げた認識用音声は、拡大を実行する処理と対応づけられている。実行する処理を示す認識用音声が、この実行する処理と対応づけられているこということは、実行する処理を示す認識用音声が、この実行する処理を示す識別子、具体的には、処理を実行させるための指示や指示名と対応づけられていることと考えてもよい。このような場合、実行される処理を表す認識用音声を取得するために読み上げられた実行する処理の識別子等が示す処理を、認識用音声に対応する実行する処理とし、この処理の識別子等を、上記で取得される実行する処理を示す情報としてもよい。認識用音声や、認識用音声に対応する特徴量等は、図示しない格納部等に予め格納しておくようにすればよい。 Further, the recognition voice indicating the process to be executed is a voice obtained by reading a text representing the process to be executed in a natural language. For example, the recognition voice is a voice that reads out the text “move” or a voice that reads the text “large”, and the recognition voice that reads “move” is associated with the process of executing the movement. The recognition voice that reads out “Large” is associated with a process of executing enlargement. The fact that the recognition voice indicating the process to be executed is associated with the process to be executed means that the recognition voice indicating the process to be executed is an identifier indicating the process to be executed. It may be considered that it is associated with an instruction or instruction name for execution. In such a case, the process indicated by the identifier of the process to be executed read out to acquire the recognition voice representing the process to be executed is the process to be executed corresponding to the recognition voice, and the identifier of this process is Information indicating the processing to be executed acquired as described above may be used. The recognition voice and the feature amount corresponding to the recognition voice may be stored in advance in a storage unit (not shown).

音声に関して取得される特徴量は、例えば、短時間ごとに切り出された音声信号から抽出される特徴ベクトルを時系列に配列したものである。ここで取得する特徴量は、例えば、三角型フィルタを用いたチャネル数２４のフィルタバンク出力を離散コサイン変換したＭＦＣＣであり、その静的パラメータ、デルタパラメータ及びデルタデルタパラメータをそれぞれ１２次元有し、さらに正規化されたパワーとデルタパワー及びデルタデルタパワーを有してもよい（合計３９次元）。あるいは、特徴量は、ＭＦＣＣの１２次元、ΔＭＦＣＣの１２次元、Δ対数パワーの１次元を含む２５次元のものであってもよい。このように、種々の特徴量を用いることが可能である。特徴量は、特徴パラメータ、特徴ベクトルとも呼ばれる。 The feature amount acquired with respect to speech is, for example, a sequence of feature vectors extracted from speech signals cut out every short time. The feature quantity acquired here is, for example, MFCC obtained by discrete cosine transform of a filter bank output of 24 channels using a triangular filter, and has 12 dimensions each of its static parameter, delta parameter, and delta delta parameter, It may also have normalized power and delta power and delta delta power (39 dimensions total). Alternatively, the feature quantity may be 25 dimensions including 12 dimensions of MFCC, 12 dimensions of ΔMFCC, and 1 dimension of Δlogarithmic power. As described above, various feature amounts can be used. The feature amount is also called a feature parameter or feature vector.

音声や音声データについて特徴量を取得する処理は公知技術であるため、ここでは詳細な説明は省略する。また、音声や音声データについて取得した特徴量を照合して、適合度を示す値等を取得する処理は公知技術であるため、ここでは詳細な説明は省略する。 Since the process of acquiring feature quantities for voice and voice data is a known technique, detailed description thereof is omitted here. Moreover, since the process which collates the feature-value acquired about audio | voice and audio | voice data, and acquires the value etc. which show a fitness is a well-known technique, detailed description is abbreviate | omitted here.

なお、音声認識部１０３は、上記の音声の特徴量の代わりに、音声や音声データを音素や音素片に分解した情報等を用いて照合を行なって、処理対象を示す情報や、実行する処理を示す情報を取得しても良い。音素や音素片を取得する処理や、音素や音素片を符号化したデータを用いて検索を行なう処理については、公知技術であるため、ここでは詳細な説明は省略する。認識用音声や、認識用音声に対応する音素や音素片等は、図示しない格納部等に予め格納しておくようにすればよい。 Note that the speech recognition unit 103 performs collation using information obtained by decomposing speech or speech data into phonemes or phoneme pieces instead of the speech feature values described above, information indicating the processing target, and processing to be executed You may acquire the information which shows. Since processing for acquiring phonemes and phonemes and processing for searching using data obtained by encoding phonemes and phonemes is a known technique, detailed description thereof is omitted here. The recognition speech and the phonemes and phonemes corresponding to the recognition speech may be stored in advance in a storage unit (not shown).

なお、音声認識部１０３は、上記と同様に、音声認識処理を行なって、被処理対象を示す情報を更に有する情報を取得してもよい。被処理対象とは、例えば、オーサリングの処理対象であるオブジェクトが配置されるページやレイヤ等を意味する。被処理対象は、例えば、オブジェクトの移動先や、移動元や、複写先や、複写元等と考えてもよい。被処理対象を示す情報や、例えばページの識別子（例えばページ番号）や、レイヤの識別子等である。例えば、音声認識部１０３は、被処理対象を示す情報と対応づけられた被処理対象を示す認識用文字列や、認識用音声を用いて、上記と同様に、被処理対象を示す情報を取得しても良い。なお、音声認識処理により、被処理対象を示す情報を有する情報を取得する処理は、上述したような処理対象を示す情報を有する情報を取得する処理と同様であるため、ここでは説明を省略する。 Note that the voice recognition unit 103 may perform voice recognition processing in the same manner as described above to acquire information further including information indicating the processing target. The processing target means, for example, a page or a layer on which an object that is an authoring processing target is arranged. The processing target may be considered as, for example, an object movement destination, a movement source, a copy destination, a copy source, or the like. Information indicating the processing target, for example, a page identifier (for example, a page number), a layer identifier, and the like. For example, the voice recognition unit 103 acquires information indicating the processing target using the recognition character string indicating the processing target associated with the information indicating the processing target or the recognition voice, as described above. You may do it. Note that the process of acquiring information including information indicating the processing target by the voice recognition process is the same as the process of acquiring information including information indicating the processing target as described above, and thus description thereof is omitted here. .

同様に、音声認識部１０３は、音声認識処理を行って、レイヤを示す情報を更に有する情報を取得してもよい。ただし、ここで取得されるレイヤを示す情報は、当該レイヤに配置されている処理対象を特定するためのレイヤを示す情報である。例えば、音声認識部１０３は、レイヤを示す情報と対応づけられたレイヤを示す認識用文字列や、認識用音声を用いて、上記と同様に、レイヤを示す情報を取得しても良い。なお、音声認識処理により、レイヤを示す情報を有する情報を取得する処理は、上述したような処理対象を示す情報を有する情報を取得する処理と同様であるため、ここでは説明を省略する。 Similarly, the voice recognition unit 103 may perform voice recognition processing and acquire information further including information indicating a layer. However, the information indicating the layer acquired here is information indicating a layer for specifying a processing target arranged in the layer. For example, the speech recognition unit 103 may acquire information indicating a layer in the same manner as described above using a recognition character string indicating a layer associated with information indicating a layer or a recognition speech. Note that the process of acquiring information including information indicating a layer by the voice recognition process is the same as the process of acquiring information including information indicating the processing target as described above, and thus description thereof is omitted here.

なお、音声認識部１０３が音声認識の結果として、音声データに対応するテキストデータを取得する場合、上述したようなテキストデータから、認識用文字列等を用いて、処理対象を示す情報や、実行する処理を示す情報を取得する処理を、音声認識部１０３が行なう代わりに、オーサリング処理部１０４が行なうようにしてもよい。 In addition, when the speech recognition unit 103 acquires text data corresponding to speech data as a result of speech recognition, information indicating a processing target using a recognition character string or the like from the text data as described above, or execution Instead of the voice recognition unit 103 performing processing for acquiring information indicating the processing to be performed, the authoring processing unit 104 may perform the processing.

オーサリング処理部１０４は、音声認識部１０３の音声認識処理の結果に応じて、オーサリング処理を行なう。オーサリング処理部１０４は、オーサリングデータ格納部１０１に格納されている１以上のオーサリングデータに配置された１以上のオブジェクトについてオーサリング処理を行なう。例えば、オーサリング処理部１０４は、オーサリング処理を行なって、オーサリングデータを作成する。例えば、オーサリング処理部１０４は、オーサリング処理を行なって、オーサリングデータ格納部１０１に格納されている１以上のオーサリングデータを更新する。また、例えば、オーサリング処理部１０４は、オーサリング処理を行なって、オーサリングデータ格納部１０１に格納されている１以上のオーサリングデータから、新たなオーサリングデータを取得してもよい。 The authoring processing unit 104 performs the authoring process according to the result of the voice recognition process of the voice recognition unit 103. The authoring processing unit 104 performs an authoring process on one or more objects arranged in one or more authoring data stored in the authoring data storage unit 101. For example, the authoring processing unit 104 performs authoring processing and creates authoring data. For example, the authoring processing unit 104 performs authoring processing to update one or more authoring data stored in the authoring data storage unit 101. Further, for example, the authoring processing unit 104 may perform an authoring process and acquire new authoring data from one or more authoring data stored in the authoring data storage unit 101.

オーサリング処理とは、電子書籍やＷＥＢページ等のデジタルコンテンツやマルチメディアデータなどの制作や編集を行なう処理である。オーサリング処理は、例えば、オーサリングデータに配置されたオブジェクトを処理対象として行なわれる処理や、オーサリングデータを構成するページを作成したり、削除したり、ページの順番等を変更したりする処理や、ページの属性（例えば、サイズや、色等）を設定する処理や、ページに図示しない格納部等に格納されている１以上のコンテンツを配置したり、ページからコンテンツを削除する処理等である。オーサリング処理は、オーサリングデータを新規に作成する処理や、ページにレイヤを作成したり、削除したりする操作であってもよい。 The authoring process is a process for producing and editing digital contents such as an electronic book and a WEB page, and multimedia data. Authoring processing includes, for example, processing that is performed on objects placed in authoring data, processing that creates or deletes pages that make up authoring data, changes the order of pages, etc. For example, a process of setting one or more contents stored in a storage unit (not shown) or the like, or a process of deleting the contents from the page. The authoring process may be a process for creating new authoring data or an operation for creating or deleting a layer on a page.

オーサリング処理部１０４がオブジェクトについて行なうオーサリング処理とは、特に、オーサリング処理のうちの、オブジェクトを処理対象として行なわれる処理である。オブジェクトについて行なわれるオーサリング処理は、例えば、オブジェクトの属性を変更する処理である。オブジェクトの属性とは、例えば、オブジェクトを表示するための属性である。オブジェクトを表示するための属性は、例えば、オブジェクトの位置や、サイズ、色、向き、透過度、解像度等である。例えば、オーサリング処理部１０４は、オブジェクトの位置を変更する処理を行なう。また、オーサリング処理部１０４は、オブジェクトの色を変更する処理を行なう。オブジェクトの色を変更する処理は、例えば、画像オブジェクトやテキストオブジェクトの明度、色相、および彩度の少なくとも一つを変更する処理である。あるいは、オブジェクトのＲＧＢ値や、ＣＭＹＫ値を変更する処理であっても良い。 The authoring process performed on the object by the authoring processing unit 104 is a process performed on an object as a processing target in the authoring process. The authoring process performed on the object is, for example, a process of changing the attribute of the object. The attribute of the object is an attribute for displaying the object, for example. Attributes for displaying the object include, for example, the position, size, color, orientation, transparency, resolution, and the like of the object. For example, the authoring processing unit 104 performs a process of changing the position of the object. The authoring processing unit 104 performs processing for changing the color of the object. The process of changing the color of the object is, for example, a process of changing at least one of brightness, hue, and saturation of an image object or text object. Alternatively, it may be processing for changing the RGB value or CMYK value of the object.

また、オブジェクトについて行なうオーサリング処理は、オブジェクトをページに配置したり、ページから削除したりする処理であってもよい。また、一のページに配置されたオブジェクトを、他のページに配置する処理や、複製する処理等であっても良い。ここでのページは、例えば、オーサリング処理の結果として得られるオーサリングデータのページである。 Further, the authoring process performed on the object may be a process of arranging the object on the page or deleting it from the page. Moreover, the process arrange | positioned the object arrange | positioned on one page on another page, the process to duplicate, etc. may be sufficient. The page here is, for example, a page of authoring data obtained as a result of the authoring process.

例えば、オーサリング処理部１０４は、音声認識部１０３による音声認識処理の結果（以下、音声認識結果と称す）の少なくとも一部から処理対象の一または二以上のオブジェクトを特定する。また、音声認識結果の少なくとも一部から実行する一または二以上の処理を特定する。そして、上記で特定した処理対象のオブジェクトに対して、上記で特定した処理を行ってオーサリングデータを取得する。 For example, the authoring processing unit 104 identifies one or more objects to be processed from at least a part of a result of the speech recognition processing (hereinafter referred to as a speech recognition result) by the speech recognition unit 103. Further, one or more processes to be executed from at least a part of the speech recognition result are specified. Then, authoring data is obtained by performing the above-described processing on the processing target object specified above.

例えば、オーサリング処理部１０４は、音声認識部１０３が音声認識結果として取得した、処理対象を示す情報と実行する処理を示す情報とを有する情報の少なくとも一部から処理対象を示す情報を取得して、この情報が示す一または二以上のオブジェクトを特定する。例えば、処理対象を示す情報が、処理対象となるオブジェクトの識別子（例えば、オブジェクト名やオブジェクトのファイル名等）である場合、オーサリング処理部１０４は、この識別子と対応づけられたオブジェクトを処理対象に特定する。また、処理対象が示す情報が、処理対象を特定するための条件の情報である場合、この条件を満たすオブジェクトを特定してもよい。例えば、処理対象を示す情報が、処理対象を特定するための属性値（例えば、データタイプや、色や、サイズ等）である場合、オーサリング処理部１０４は、この属性値を有するオブジェクトを処理対象に特定する。また、処理対象を示す情報が、ページやレイヤを示す情報である場合、オーサリング処理部１０４は、この処理対象を示す情報が示すページやレイヤに配置されているオブジェクトを処理対象に特定しても良い。また、音声認識結果として複数の処理対象を示す情報が取得された場合、オーサリング処理部１０４は、これらを組合わせにより特定されるオブジェクト（例えば、これらの論理和や論理積が示すオブジェクト）を処理対象に特定しても良い。 For example, the authoring processing unit 104 acquires the information indicating the processing target from at least a part of the information including the information indicating the processing target and the information indicating the processing to be executed, acquired by the voice recognition unit 103 as the voice recognition result. , Specify one or more objects indicated by this information. For example, when the information indicating the processing target is an identifier of the object to be processed (for example, an object name or an object file name), the authoring processing unit 104 sets the object associated with this identifier as the processing target. Identify. In addition, when the information indicated by the processing target is information on a condition for specifying the processing target, an object that satisfies this condition may be specified. For example, when the information indicating the processing target is an attribute value (for example, data type, color, size, etc.) for specifying the processing target, the authoring processing unit 104 selects an object having this attribute value as the processing target. To be specific. Further, when the information indicating the processing target is information indicating the page or layer, the authoring processing unit 104 may identify the object arranged on the page or layer indicated by the information indicating the processing target as the processing target. good. Further, when information indicating a plurality of processing targets is acquired as a speech recognition result, the authoring processing unit 104 processes an object specified by combining them (for example, an object indicated by a logical sum or logical product thereof). It may be specified as a target.

例えば、オーサリング処理部１０４は、音声認識部１０３が音声認識結果として取得した、処理対象を示す情報と実行する処理を示す情報とを有する情報の少なくとも一部から実行する処理を示す情報を取得して、この情報が示す一または二以上の処理を特定する。そして、オブジェクト等に対して、上記で特定した処理を行ってオーサリングデータを取得してもよい。例えば、オーサリング処理部１０４は、上記で特定した処理対象のオブジェクト等に対して、上記で特定した処理を行なってオーサリングデータを取得しても良い。 For example, the authoring processing unit 104 acquires information indicating processing to be executed from at least a part of information including information indicating a processing target and information indicating processing to be executed, acquired by the voice recognition unit 103 as a voice recognition result. Thus, one or more processes indicated by this information are specified. Then, the authoring data may be acquired by performing the processing specified above on the object or the like. For example, the authoring processing unit 104 may acquire the authoring data by performing the processing specified above on the processing target object specified above.

なお、上述したような音声認識部１０３が音声認識を行なって取得した音声に対応したテキストデータから、認識用文字列等を用いて、処理対象を示す情報や、実行する処理を示す情報を取得する処理を音声認識部１０３が行なう代わりに、オーサリング処理部１０４が行なうようにし、取得した処理対象を示す情報や、実行する処理を示す情報を用いて、オーサリング処理部１０４が処理対象や、実行する処理を特定するようにしても良い。後述する具体例においては、このような場合を例に挙げて説明する。 It should be noted that information indicating the processing target and information indicating the processing to be executed are acquired from text data corresponding to the voice acquired by the voice recognition unit 103 as described above using voice recognition. Instead of the voice recognition unit 103 performing the processing to be performed, the authoring processing unit 104 performs the processing, and the authoring processing unit 104 uses the information indicating the acquired processing target and the information indicating the processing to be executed. You may make it identify the process to perform. In a specific example to be described later, such a case will be described as an example.

オーサリング処理部１０４は、例えば、音声認識部１０３が音声認識結果として取得した音声受付部１０２が受け付けた音声が示すテキストデータの先頭側から、処理対象を示す文字列を取得し、後尾側から実行する処理を示す文字列を取得する。また、例えば、オーサリング処理部１０４は、音声認識結果として取得したテキストデータの先頭側から、処理対象を示す文字列を取得し、その直後、あるいは、予め指定された１以上の手がかり句を挟んで配置される位置から、実行する処理を示す文字列を取得してもよい。また、オーサリング処理部１０４は、例えば、形態素解析と、手がかり句との組み合わせや、文字列の位置関係等により、処理対象を示す文字列や、実行する処理を示す文字列を取得してもよい。例えば、オーサリング処理部１０４は、「画像を拡大する」というテキストデータの先頭側から、「画像」という名詞句を処理対象を示す文字列として取得し、その後に「を」という手がかり句を介して位置する「拡大」という名詞句を、実行する処理を示す文字列として取得しても良い。また、オーサリングデータ格納部１０１に格納されている一のオーサリングデータに配置されたオブジェクトや、既に、ページ等に配置されているオブジェクトや、現在、表示部１０６により少なくとも一部が表示されているオーサリングデータに配置された一以上のオブジェクトに対応するオブジェクトの識別子の中から、音声認識結果により取得されたテキストデータ内に含まれるものを検索等により特定してもよい。同様に、既に、オーサリングデータのページ等に配置されているオブジェクトや、現在、表示部１０６により表示されているオブジェクトに対応する属性値や、予め図示しない格納部等に用意されている属性値の中から、対応する属性値が、音声認識結果により取得されたテキストデータ内に含まれる属性値を検索等により特定してもよい。同様に、オーサリング処理部１０４が実行可能なオーサリング処理を示す情報の中から、音声認識結果により取得されたテキストデータ内に含まれるものを検索等により特定してもよい。 For example, the authoring processing unit 104 acquires a character string indicating the processing target from the head side of the text data indicated by the voice received by the voice receiving unit 102 acquired as the voice recognition result by the voice recognition unit 103, and executes it from the tail side. Acquires the character string indicating the processing to be performed. In addition, for example, the authoring processing unit 104 acquires a character string indicating a processing target from the head side of text data acquired as a speech recognition result, and immediately after that or sandwiches one or more pre-designated clue phrases. You may acquire the character string which shows the process to perform from the position arrange | positioned. The authoring processing unit 104 may acquire a character string indicating a processing target or a character string indicating a process to be executed based on, for example, a combination of a morphological analysis and a clue phrase, a positional relationship between character strings, and the like. . For example, the authoring processing unit 104 acquires the noun phrase “image” as a character string indicating the processing target from the head of the text data “enlarge image”, and then uses the clue phrase “ The noun phrase “enlarged” that is positioned may be acquired as a character string indicating the process to be executed. In addition, an object arranged in one authoring data stored in the authoring data storage unit 101, an object already arranged on a page or the like, or an authoring that is currently displayed at least partially by the display unit 106 Of the identifiers of the objects corresponding to one or more objects arranged in the data, the one included in the text data acquired from the speech recognition result may be specified by searching or the like. Similarly, an attribute value corresponding to an object already arranged on a page or the like of authoring data, an object currently displayed on the display unit 106, or an attribute value prepared in advance in a storage unit (not shown) or the like. The attribute value corresponding to the attribute value included in the text data acquired from the speech recognition result may be specified by searching or the like. Similarly, information included in text data acquired from the speech recognition result may be specified by searching or the like from information indicating authoring processing that can be executed by the authoring processing unit 104.

なお、オーサリング処理部１０４は、処理対象を示す情報を、適宜、処理対象を特定可能な情報に変換して、この変換した情報を用いて、処理対象を特定してもよい。かかる場合も処理対象を示す情報を用いて、処理対象を特定することと考えてもよい。例えば、処理対象を示す情報が「画像」という自然言語の文字列である場合、この情報を、図示しない格納部等に予め用意された変換表や、変換ルール等を用いて、「データタイプ」が「イメージ」であるデータを処理対象に特定する情報等に変換してもよい。かかる変換は、公知であるため、ここでは詳細な説明は省略する。 Note that the authoring processing unit 104 may appropriately convert the information indicating the processing target into information that can specify the processing target, and specify the processing target using the converted information. In such a case, it may be considered that the processing target is specified using information indicating the processing target. For example, when the information indicating the processing target is a natural language character string “image”, this information is converted into a “data type” using a conversion table prepared in advance in a storage unit (not shown), a conversion rule, or the like. May be converted into information specifying the processing target. Since such conversion is well-known, detailed description is omitted here.

また、オーサリング処理部１０４は、実行する処理を示す情報を、適宜、実行する処理を特定可能な情報に変換して、この変換した情報を用いて、実行する処理を特定してもよい。かかる場合も実行する処理を示す情報を用いて、実行する処理を特定することと考えてもよい。例えば、実行する処理を示す情報が「大きく」という自然言語の文字列である場合、この情報を、図示しない格納部等に予め用意された変換表や、変換ルール等を用いて、拡大する処理を特定する指示やコマンド等に変換してもよい。 Further, the authoring processing unit 104 may appropriately convert information indicating the process to be executed into information that can specify the process to be executed, and specify the process to be executed using the converted information. In such a case as well, it may be considered that information to be executed is used to specify the process to be executed. For example, when the information indicating the processing to be executed is a natural language character string “large”, this information is expanded using a conversion table or conversion rules prepared in advance in a storage unit (not shown). It may be converted into an instruction or a command for specifying.

なお、オーサリング処理部１０４は、音声認識の結果から、処理対象のオブジェクトだけを特定してもよく、実行する処理だけを特定するようにしてもよい。例えば、オーサリング処理部１０４は、音声認識の結果から、処理対象のオブジェクトを特定し、実行する処理は、予め指定されたデフォルトの処理を特定するようにしても良い。また、オーサリング処理部１０４は、現在、表示部１０６が表示しているオブジェクト（あるいは、現在表示されているオーサリングデータに配置されている現在表示されていないオブジェクト）を処理対象のオブジェクトに特定し、音声認識の結果から実行する処理を特定するようにしても良い。 Note that the authoring processing unit 104 may specify only the object to be processed from the result of speech recognition, or may specify only the process to be executed. For example, the authoring processing unit 104 may specify an object to be processed from the result of speech recognition, and the process to be executed may specify a default process specified in advance. In addition, the authoring processing unit 104 identifies an object currently displayed on the display unit 106 (or an object that is not currently displayed arranged in the currently displayed authoring data) as an object to be processed, You may make it identify the process performed from the result of speech recognition.

オーサリング処理部１０４は、例えば、表示部１０６が表示しているオブジェクトを処理対象としてオーサリング処理を行なう。オーサリング処理部１０４は、例えば、音声認識結果により特定された処理対象のオブジェクトのうちの、表示部１０６が現在表示しているオブジェクトを処理対象として、オーサリング処理を行う。また、オーサリング処理部１０４は、音声認識結果が、処理対象を示す情報を有さない場合、表示部１０６が現在表示しているオブジェクトを処理対象として、オーサリング処理を行うようにしてもよい。 For example, the authoring processing unit 104 performs an authoring process on the object displayed on the display unit 106 as a processing target. For example, the authoring processing unit 104 performs the authoring process using the object currently displayed by the display unit 106 among the processing target objects specified by the voice recognition result. Further, when the speech recognition result does not include information indicating the processing target, the authoring processing unit 104 may perform the authoring process using the object currently displayed on the display unit 106 as the processing target.

なお、オーサリング処理部１０４は、表示部１０６が表示画面内に全体を表示しているオブジェクトのみを処理対象としてオーサリング処理を行なうようにしてもよい。例えば、オーサリング処理部１０４は、表示部１０６が表示しているオブジェクトについて、表示画面内に全体が表示されているか否かを判断し、全体が表示されているものだけを表示対象に特定し、全体が表示されていないものは表示対象から除外するようにしてもよい。なお、この場合においても、上述したように、音声認識結果により特定された処理対象のオブジェクトのうちの、表示部１０６が表示画面内に全体を表示しているオブジェクトのみを処理対象として特定してもよい。 Note that the authoring processing unit 104 may perform the authoring process only on the object that the display unit 106 displays in its entirety on the display screen. For example, the authoring processing unit 104 determines whether or not the object displayed on the display unit 106 is entirely displayed on the display screen, specifies only the object that is displayed as a display target, You may make it exclude from the display object what has not been displayed entirely. Even in this case, as described above, among the objects to be processed specified by the speech recognition result, only the object whose display unit 106 displays the whole on the display screen is specified as the processing target. Also good.

また、オーサリング処理部１０４は、表示部１０６が表示していないオブジェクトを処理対象としてオーサリング処理を行なうようにしてもよい。例えば、オーサリング処理部１０４は、現在表示されていないページに配置されたオブジェクトや、現在表示画面内に表示されていないオブジェクトを処理対象に特定してオーサリング処理を行うようにしてもよい。この場合、全体が表示されていないオブジェクトを、表示していないオブジェクトに含めるようにしてもよく、含めないようにしてもよい。また、この場合においても、上述したように、音声認識結果により特定された処理対象のオブジェクトのうちの、表示部１０６が表示していないオブジェクトだけを処理対象として特定してもよい。 Further, the authoring processing unit 104 may perform an authoring process on an object that is not displayed on the display unit 106 as a processing target. For example, the authoring processing unit 104 may perform an authoring process by specifying an object arranged on a page that is not currently displayed or an object that is not currently displayed in the display screen as a processing target. In this case, an object that is not displayed as a whole may or may not be included in an object that is not displayed. Also in this case, as described above, only the objects that are not displayed by the display unit 106 among the processing target objects specified by the voice recognition result may be specified as the processing target.

また、オーサリング処理部１０４が取得するオーサリングデータが、１または２以上のレイヤを有する場合、オーサリング処理部１０４は、オーサリングデータの、音声認識処理により指定されたレイヤに対してオーサリング処理を行なうようにしてもよい。レイヤに対するオーサリング処理とは、例えば、指定されたレイヤが、オブジェクトが配置可能な一以上のレイヤである場合、このレイヤに配置されたオブジェクトについて行なわれる拡大や移動等の上述したようなオーサリング処理である。レイヤに配置されたオブジェクトについてオーサリング処理を行なうよう場合、指定されたレイヤに配置されたすべてのオブジェクトを処理対象として特定してもよく、上述したように、音声認識結果により特定された処理対象のオブジェクトのうちの、指定されたレイヤに配置されたオブジェクトだけを処理対象として特定してもよい。また、レイヤに対するオーサリング処理は、例えば、レイヤ自身の属性値等を変更する処理である。例えば、一のレイヤが、このレイヤの下に配置される他のレイヤに配置されたオブジェクトの属性値を変更するレイヤである場合、オーサリング処理は、この属性値を設定する処理であっても良く、このようなレイヤを生成する処理であっても良い。例えば、一のレイヤが、その下に配置される画像オブジェクトの彩度を変更するレイヤである場合、オーサリングｓよりは、この一のレイヤが変更する彩度の量を変更する処理であっても良い。例えば、このような、下に配置される画像オブジェクトの属性値を変更するレイヤとしては、調整レイヤと呼ばれるものがある。また、レイヤ自身の属性値等を変更する処理は、レイヤの透過度や合成モード等の属性値を変更する処理であってもよい。 Further, when the authoring data acquired by the authoring processing unit 104 has one or more layers, the authoring processing unit 104 performs the authoring process on the layer specified by the speech recognition process of the authoring data. May be. The authoring process for a layer is, for example, the above-described authoring process such as enlargement or movement performed on an object placed on this layer when the designated layer is one or more layers on which the object can be placed. is there. When authoring processing is performed on an object placed on a layer, all objects placed on a specified layer may be specified as processing targets. As described above, the processing target specified by the speech recognition result may be specified. Of the objects, only the objects arranged in the designated layer may be specified as the processing target. Also, the authoring process for a layer is a process for changing, for example, the attribute value of the layer itself. For example, when one layer is a layer that changes the attribute value of an object arranged in another layer arranged below this layer, the authoring process may be a process of setting this attribute value A process for generating such a layer may be used. For example, when one layer is a layer that changes the saturation of an image object placed thereunder, even if the processing is to change the amount of saturation that this one layer changes rather than authoring s. good. For example, as a layer for changing the attribute value of the image object arranged below, there is a so-called adjustment layer. Further, the process of changing the attribute value of the layer itself may be a process of changing the attribute value such as the transparency of the layer and the composition mode.

オーサリング処理部１０４は、音声認識結果の少なくとも一部から、被処理対象を特定し、被処理対象に対して１以上のオブジェクトを用いてオーサリング処理を行なうようにしてもよい。例えば、オーサリング処理部１０４は、音声認識結果から、被処理対象を示す情報を取得して、この被処理対象を示す情報が示す被処理対象を特定する。そして、この被処理対象に対して、オブジェクトを用いてオーサリング処理を行う。例えば、オーサリング処理部１０４は、現在表示されているページに配置された一のオブジェクトを、被処理対象を示す情報で特定されるページへ移動させるようにしてもよい。 The authoring processing unit 104 may identify the processing target from at least a part of the speech recognition result, and perform the authoring process using one or more objects for the processing target. For example, the authoring processing unit 104 acquires information indicating the processing target from the voice recognition result, and specifies the processing target indicated by the information indicating the processing target. Then, an authoring process is performed on the processing target using an object. For example, the authoring processing unit 104 may move one object arranged on the currently displayed page to a page specified by information indicating the processing target.

オーサリング処理部１０４は、例えば、音声認識結果の少なくとも一部から、処理対象のオブジェクトを特定できた場合に、オブジェクトに対してオーサリング処理を行ない、処理対象のオブジェクトを特定できない場合に、表示部１０６が表示しているオブジェクトを処理対象としてオーサリング処理を行なうようにしてもよい。 For example, when the object to be processed can be specified from at least a part of the speech recognition result, the authoring processing unit 104 performs the authoring process on the object, and when the object to be processed cannot be specified, the display unit 106 The authoring process may be performed with the object displayed as the object to be processed.

オーサリング処理部１０４は、例えば、音声認識結果の少なくとも一部から、処理対象のオブジェクトを特定できた場合に、オブジェクトに対してオーサリング処理を行ない、処理対象のオブジェクトを特定できない場合に、オーサリングされている全てのオブジェクトを処理対象としてオーサリング処理を行なうようにしてもよい。オーサリングされているすべてのオブジェクトとは、例えば、オーサリング処理により、いずれかのページに配置されているすべてのオブジェクトや、オーサリング処理により、オーサリングデータに含まれることとなった全てのオブジェクトである。ただし、予めページ等に固定されているページ番号等の文字列や画像や、テンプレート等として固定されている文字列や画像等については、オーサリングされているオブジェクトから除外するようにしても良い。 For example, the authoring processing unit 104 performs the authoring process on the object when the object to be processed can be identified from at least a part of the speech recognition result, and is authored when the object to be processed cannot be identified. The authoring process may be performed with all objects being processed. The all authored objects are, for example, all the objects arranged on any page by the authoring process or all objects that are included in the authoring data by the authoring process. However, a character string such as a page number or an image fixed in advance to a page or the like, or a character string or an image fixed as a template or the like may be excluded from the authored object.

なお、オーサリング処理部１０４が、オーサリングデータを生成したり、ページを生成したり、削除したり、レイヤを生成したり、オブジェクトを配置したり削除したり、オブジェクトの属性を変更したりする処理は、公知技術であるため、ここでは、詳細な説明は省略する。 Note that the authoring processing unit 104 generates authoring data, generates pages, deletes, generates layers, places and deletes objects, and changes attributes of objects. Since it is a well-known technique, detailed description is omitted here.

なお、オブジェクトが、音声付データである場合、音声認識部１０３は、処理対象を示す情報として、音声受付部１０２が受け付けた音声から、処理対象を示す音声データや、この音声データの特徴量や、音素や、音素片等の処理対象を示す音声に関する情報を取得するようにし、オーサリング処理部１０４は、これらの処理対象を示す情報を用いて、音声付データであるオブジェクトが有する音声データについて、処理対象を示す音声に関する情報を用いて音声検索を行って、一致する音声を有すると判断された音声付データであるオブジェクトを処理対象に特定してもよい。 When the object is data with sound, the speech recognition unit 103 uses the speech data received by the speech reception unit 102 as the information indicating the processing target, the voice data indicating the processing target, the feature amount of the voice data, The information about the speech indicating the processing target such as phonemes and phoneme pieces is acquired, and the authoring processing unit 104 uses the information indicating these processing targets, A voice search may be performed using information related to voice indicating the processing target, and an object that is data with voice determined to have matching voice may be specified as the processing target.

出力部１０５は、オーサリング処理部１０４の処理結果を出力する。例えば、出力部１０５は、オーサリング処理部１０４が取得したオーサリングデータを出力する。 The output unit 105 outputs the processing result of the authoring processing unit 104. For example, the output unit 105 outputs the authoring data acquired by the authoring processing unit 104.

ここでの出力とは、プリンタへの印字、外部の装置への送信、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡し、図示しないモニタ等への表示などを含む概念である。例えば、出力部１０５は、オーサリング処理部１０４が取得したオーサリングデータを、オーサリングデータ格納部１０１に蓄積する。ここでのオーサリング処理部１０４が取得したオーサリングデータは、オーサリング処理中において一時的に取得されたオーサリングデータ等も含むと考えてもよい。また、例えば、出力部１０５は、オーサリング処理部１０４が取得したオーサリングデータを、外部の記憶媒体に蓄積したり、外部の装置等に送信してもよい。なお、ここでの表示は、後述する表示部１０６が行なう表示と同様の表示であっても良く、例えば、外部の装置を利用した表示等であっても良い。 Output here refers to printing on a printer, transmission to an external device, storage in a recording medium, delivery of processing results to another processing device or another program, display on a monitor (not shown), etc. It is a concept that includes. For example, the output unit 105 accumulates the authoring data acquired by the authoring processing unit 104 in the authoring data storage unit 101. The authoring data acquired by the authoring processing unit 104 here may be considered to include the authoring data temporarily acquired during the authoring process. Further, for example, the output unit 105 may accumulate the authoring data acquired by the authoring processing unit 104 in an external storage medium or transmit the data to an external device or the like. The display here may be a display similar to the display performed by the display unit 106 described later, for example, a display using an external device or the like.

出力部１０５は、通信手段等の出力デバイスを含むと考えても含まないと考えても良い。出力部１０５は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The output unit 105 may or may not include an output device such as a communication unit. The output unit 105 can be implemented by output device driver software, or output device driver software and an output device.

表示部１０６は、オーサリングデータ格納部１０１に格納されている１以上のオーサリングデータを表示する。表示部１０６は、例えば、１以上のオブジェクトが配置されたオーサリングデータを表示する。表示部１０６が表示するオーサリングデータは、１以上のページのオーサリングデータや、１以上のページ内の一部の領域のオーサリングデータであっても良い。また、表示部１０６は、一のオーサリングデータの１以上のページの１以上のレイヤを表示しても良い。表示部１０６は、例えば、オーサリング処理の途中等において、オーサリングデータに変更が加えられる毎に、変更したオーサリングデータを表示してもよい。例えば、オーサリング処理部１０４によるオーサリング処理に応じてオーサリングデータ格納部１０１に格納されたオーサリングデータが更新される毎に、この更新に応じて、オーサリングデータの表示を更新してもよい。なお、ここでの表示は、オーサリングデータの画面を表示するためのデータを、図示しない外部の装置に送信することであっても良く、この画像を表示するためのデータにより外部の装置でオーサリングデータの画面を表示させることであっても良い。この外部の装置は、例えば、オーサリング装置１がサーバ装置である場合、このサーバ装置にネットワーク等で接続された端末装置である。この外部の装置は、例えば、オーサリングデータの画面を表示するためのデータを用いてこの画面の表示を行なう表示手段（図示せず）や、表示デバイス（図示せず）等を備えた装置である。 The display unit 106 displays one or more authoring data stored in the authoring data storage unit 101. The display unit 106 displays, for example, authoring data in which one or more objects are arranged. The authoring data displayed by the display unit 106 may be authoring data for one or more pages or authoring data for some areas in one or more pages. Further, the display unit 106 may display one or more layers of one or more pages of one authoring data. The display unit 106 may display the changed authoring data each time the authoring data is changed, for example, during the authoring process. For example, every time the authoring data stored in the authoring data storage unit 101 is updated according to the authoring process by the authoring processing unit 104, the display of the authoring data may be updated in accordance with this update. Note that the display here may be that data for displaying the authoring data screen is transmitted to an external device (not shown), and the authoring data is displayed on the external device by the data for displaying this image. It is also possible to display the screen. For example, when the authoring device 1 is a server device, the external device is a terminal device connected to the server device via a network or the like. This external device is, for example, a device provided with display means (not shown) for displaying this screen using data for displaying the authoring data screen, a display device (not shown), and the like. .

表示部１０６は、ディスプレイデバイスを含むと考えても含まないと考えても良い。表示部１０６は、ディスプレイデバイスのドライバーソフトまたは、ディスプレイデバイスのドライバーソフトとディスプレイデバイス等で実現され得る。 The display unit 106 may be considered as including a display device or not. The display unit 106 can be realized by display device driver software, or display device driver software and a display device.

次に、オーサリング装置１の動作の一例について図２のフローチャートを用いて説明する。ここでは、オーサリング装置１のオーサリングデータ格納部１０１に既に一以上のオーサリングデータが予め格納されている場合を例に挙げて説明する。 Next, an example of the operation of the authoring apparatus 1 will be described using the flowchart of FIG. Here, a case where one or more authoring data is already stored in the authoring data storage unit 101 of the authoring apparatus 1 will be described as an example.

（ステップＳ１０１）表示部１０６は、オーサリングデータ格納部１０１に格納されている一のオーサリングデータを表示するか否かを判断する。例えば、図示しない受付部が、一のオーサリングデータを表示する指示を受け付けた場合に、オーサリングデータを表示することを決定してもよい。オーサリングデータを表示する場合、ステップＳ１０２に進み、表示しない場合、ステップＳ１０１に戻る。 (Step S <b> 101) The display unit 106 determines whether to display one authoring data stored in the authoring data storage unit 101. For example, a receiving unit (not shown) may determine to display authoring data when receiving an instruction to display one authoring data. If the authoring data is to be displayed, the process proceeds to step S102. If not, the process returns to step S101.

（ステップＳ１０２）表示部１０６は、一以上のオブジェクトが配置された一のオーサリングデータを、オーサリングデータ格納部１０１から図示しないメモリ等の記憶媒体等に読み出して図示しないモニタ等に表示する。例えば、オーサリングデータが複数のページを有している場合、表示部１０６は、オーサリングデータの予め指定された一以上のページや、ユーザにより指定された一以上のページを表示する。 (Step S102) The display unit 106 reads out one piece of authoring data in which one or more objects are arranged from the authoring data storage unit 101 to a storage medium such as a memory (not shown) and displays it on a monitor or the like (not shown). For example, when the authoring data has a plurality of pages, the display unit 106 displays one or more pages specified in advance of the authoring data and one or more pages specified by the user.

（ステップＳ１０３）音声受付部１０２は、音声を受け付けたか否かを判断する。受け付けた場合、ステップＳ１０４に進み、受け付けていない場合、ステップＳ１１７に進む。 (Step S103) The voice receiving unit 102 determines whether or not a voice is received. If accepted, the process proceeds to step S104. If not accepted, the process proceeds to step S117.

（ステップＳ１０４）音声認識部１０３は、ステップＳ１０３で受け付けた音声について音声認識処理を行って音声認識結果であるテキストデータを取得する。 (Step S104) The voice recognition unit 103 performs voice recognition processing on the voice received in step S103, and acquires text data as a voice recognition result.

（ステップＳ１０５）オーサリング処理部１０４は、ステップＳ１０４で取得したテキストデータに、処理対象を示す情報があるか否かを判断する。ある場合、この情報を取得して、ステップＳ１０６に進み、ない場合、ステップＳ１１２に進む。オーサリング処理部１０４は、図示しない格納部等に予め格納されているオブジェクトの属性値（例えば、データタイプ）を示す一以上の文字列を、処理対象を表す認識用文字列として読み出し、読み出した文字列に一致する文字列が含まれるか否かを判断し、含まれる場合、この属性値を、処理対象を示す情報として取得する。また、例えば、オーサリング処理部１０４は、現在表示部１０６が表示しているページに配置されている一以上のオブジェクトの識別子を、認識用文字列として取得し、この識別子と一致する文字列がステップＳ１０４で取得したテキストデータに含まれるか否かを判断し、含まれる場合、このオブジェクトの識別子を処理対象を示す情報として取得してもよい (Step S105) The authoring processing unit 104 determines whether or not the text data acquired in step S104 includes information indicating a processing target. If there is, this information is acquired, and the process proceeds to step S106. If not, the process proceeds to step S112. The authoring processing unit 104 reads one or more character strings indicating object attribute values (for example, data type) stored in advance in a storage unit (not shown) or the like as a recognition character string representing a processing target, and reads the read characters It is determined whether or not a character string that matches the column is included. If it is included, this attribute value is acquired as information indicating the processing target. In addition, for example, the authoring processing unit 104 acquires an identifier of one or more objects arranged on the page currently displayed on the display unit 106 as a recognition character string, and the character string that matches the identifier is a step. It is determined whether or not it is included in the text data acquired in S104. If included, the identifier of this object may be acquired as information indicating the processing target.

（ステップＳ１０６）オーサリング処理部１０４は、処理対象となるオブジェクトを特定する。例えば、ステップＳ１０５で取得した処理対象を示す情報が、オブジェクトの識別子である場合、このオブジェクトの識別子が示すオブジェクトを、表示されているページ内において検出することで、処理対象のオブジェクトを特定する。また、ステップＳ１０５で取得した処理対象を示す情報が、オブジェクトの属性値を示す情報である場合、この属性値を有するオブジェクトを表示されているページ内において検出することで、処理対象のオブジェクトとして特定する。なお、オーサリング処理部１０４は、ここで検出したオブジェクトにおいて、オブジェクト全体が表示画面内に表示されていないオブジェクトを検出し、この検出したオブジェクトを除外したオブジェクトを、処理対象として特定するようにしてもよい。全体が表示されていないオブジェクトを処理対象から除外するか否かは、ユーザが予め設定しておくようにしても良い。 (Step S106) The authoring processing unit 104 specifies an object to be processed. For example, when the information indicating the processing target acquired in step S105 is an object identifier, the processing target object is specified by detecting the object indicated by the object identifier in the displayed page. In addition, when the information indicating the processing target acquired in step S105 is information indicating the attribute value of the object, the object having this attribute value is detected in the displayed page and specified as the processing target object. To do. The authoring processing unit 104 detects an object in which the entire object is not displayed in the display screen among the detected objects, and specifies an object excluding the detected object as a processing target. Good. The user may set in advance whether or not to exclude an object that is not entirely displayed from the processing target.

（ステップＳ１０７）オーサリング処理部１０４は、ステップＳ１０４で取得したテキストデータに、実行する処理を示す情報があるか否かを判断する。ある場合、この情報を読み出して、ステップＳ１０８に進み、ない場合、ステップＳ１１１に進む。例えば、予め用意された一以上の実行する処理を表す認識用文字列を図示しない格納部等から読み出し、読み出した実行する処理を表す認識用文字列と一致する文字列が含まれるか否かを判断し、含まれる場合、この処理を示す情報を取得する。 (Step S107) The authoring processing unit 104 determines whether or not the text data acquired in step S104 includes information indicating a process to be executed. If there is, this information is read out, and the process proceeds to step S108. If not, the process proceeds to step S111. For example, whether or not a character string that matches a recognition character string that represents a process to be executed that is read out from a storage unit (not shown) that represents one or more processes that are prepared in advance is included. If it is determined and included, information indicating this process is acquired.

（ステップＳ１０８）オーサリング処理部１０４は、ステップＳ１０４で取得したテキストデータに、被処理対象を示す情報が含まれているか否かを判断する。含まれていない場合、ステップＳ１０９に進み、含まれている場合、ステップＳ１１１に進む。例えば、現在表示されているオーサリングデータに含まれる被処理対象を示す情報を、被処理対象を表す認識用文字列として取得し、これらのいずれかと一致する被処理対象を示す情報が含まれているか否かを判断し、含まれている場合、この被処理対象を示す情報を取得する。あるいは、形態素解析と手がかり句とを利用して、被処理対象を示す情報を取得してもよい。例えば、「ページ」や「レイヤ」という手がかり句と、これらの手がかり句の直前に検出される数詞等を検出することで、被処理対象であるページやレイヤを示す情報を検出してもよい。 (Step S108) The authoring processing unit 104 determines whether or not the text data acquired in step S104 includes information indicating the processing target. If not included, the process proceeds to step S109. If included, the process proceeds to step S111. For example, information indicating the processing target included in the currently displayed authoring data is acquired as a recognition character string representing the processing target, and whether information indicating the processing target that matches any of these is included. If it is included, information indicating the processing target is acquired. Or you may acquire the information which shows a to-be-processed object using a morphological analysis and a clue phrase. For example, information indicating a page or a layer to be processed may be detected by detecting a clue phrase such as “page” or “layer” and a number or the like detected immediately before these clue phrases.

（ステップＳ１０９）オーサリング処理部１０４は、ステップＳ１０７で取得した処理を示す情報が示す処理を、ステップＳ１０６で特定したオブジェクトに実行する。そして、ステップＳ１０３に戻る。なお、表示部１０６は、処理の実行結果に応じて、表示画面を適宜、更新する。 (Step S109) The authoring processing unit 104 executes the process indicated by the information indicating the process acquired in Step S107 on the object specified in Step S106. Then, the process returns to step S103. The display unit 106 appropriately updates the display screen according to the execution result of the process.

（ステップＳ１１０）オーサリング処理部１０４は、ステップＳ１０６で特定したオブジェクトに対して、デフォルトの処理を行う。そして、ステップＳ１０３に戻る。なお、表示部１０６は、処理の実行結果に応じて、表示画面を適宜更新する。 (Step S110) The authoring processing unit 104 performs default processing on the object identified in Step S106. Then, the process returns to step S103. The display unit 106 appropriately updates the display screen according to the execution result of the process.

（ステップＳ１１１）オーサリング処理部１０４は、ステップＳ１０７で取得した処理を示す情報が示す処理を、ステップＳ１０８において取得した被処理対象について、ステップＳ１０６で特定したオブジェクトに実行する。なお、ステップＳ１０７で取得した処理を示す情報が示す処理が、ステップＳ１０８において取得した被処理対象について実行可能な処理でない場合には処理を実行しなくてもよい。そして、ステップＳ１０３に戻る。なお、表示部１０６は、処理の実行結果に応じて、表示画面を適宜更新する。 (Step S111) The authoring processing unit 104 executes the process indicated by the information indicating the process acquired in Step S107 on the object specified in Step S106 for the processing target acquired in Step S108. If the process indicated by the information indicating the process acquired in step S107 is not a process that can be executed for the processing target acquired in step S108, the process may not be executed. Then, the process returns to step S103. The display unit 106 appropriately updates the display screen according to the execution result of the process.

（ステップＳ１１２）オーサリング処理部１０４は、ステップＳ１０４で取得したテキストデータに、実行する処理を示す情報があるか否かを判断する。ある場合、この情報を読み出して、ステップＳ１１３に進み、ない場合、ステップＳ１０３に戻る。 (Step S112) The authoring processing unit 104 determines whether or not the text data acquired in step S104 includes information indicating a process to be executed. If there is, this information is read out, and the process proceeds to step S113. If not, the process returns to step S103.

（ステップＳ１１３）オーサリング処理部１０４は、ステップＳ１０４で取得したテキストデータに、レイヤを指定する情報が含まれているか否かを判断する。含まれていない場合、ステップＳ１１４に進み、含まれている場合、ステップＳ１１６に進む。 (Step S113) The authoring processing unit 104 determines whether or not the text data acquired in step S104 includes information specifying a layer. If it is not included, the process proceeds to step S114. If it is included, the process proceeds to step S116.

（ステップＳ１１４）オーサリング処理部１０４は、現在表示部１０６が表示しているオブジェクトを検出して特定する。オーサリング処理部１０４は、例えば、検出したオブジェクトの識別子等を取得する。 (Step S114) The authoring processing unit 104 detects and specifies the object currently displayed on the display unit 106. For example, the authoring processing unit 104 acquires an identifier of the detected object.

（ステップＳ１１５）オーサリング処理部１０４は、ステップＳ１１５で除外されずに残ったオブジェクトについて、ステップＳ１１２で取得した処理を示す情報が示すオーサリング処理を実行する。そして、ステップＳ１１６に戻る。なお、表示部１０６は、処理の実行結果に応じて、表示画面を適宜更新する。 (Step S115) The authoring processing unit 104 executes the authoring process indicated by the information indicating the process acquired in Step S112 for the object that is not excluded in Step S115. Then, the process returns to step S116. The display unit 106 appropriately updates the display screen according to the execution result of the process.

（ステップＳ１１６）オーサリング処理部１０４は、ステップＳ１１３で検出したレイヤを示す情報が示すレイヤに配置されているオブジェクトを処理対象として特定し、このオブジェクトについて、ステップＳ１１２で取得した処理を示す情報が示すオーサリング処理を実行する。そして、ステップＳ１１６に戻る。なお、表示部１０６は、処理の実行結果に応じて、表示画面を適宜更新する。
（ステップＳ１１７）出力部１０５は、表示部１０６が表示しているオーサリングデータを出力するか否かを判断する。例えば、図示しない受付部等が、出力する指示を受け付けた場合に、オーサリングデータを出力することを決定する。出力する場合、ステップＳ１１８に進み、出力しない場合、ステップＳ１０３に戻る。
（ステップＳ１０３）出力部１０５は、表示部１０６が表示しているオーサリングデータを出力する。例えば、出力部１０５は、オーサリングデータを、外部に送信してもよく、変更されたオーサリングデータで、オーサリングデータ格納部１０１に格納されている変更前のオーサリングデータを更新（例えば、上書き）してもよい。 (Step S116) The authoring processing unit 104 identifies an object arranged in the layer indicated by the information indicating the layer detected in Step S113 as a processing target, and information indicating the processing acquired in Step S112 indicates this object. Execute the authoring process. Then, the process returns to step S116. The display unit 106 appropriately updates the display screen according to the execution result of the process.
(Step S117) The output unit 105 determines whether or not to output the authoring data displayed on the display unit 106. For example, when a reception unit (not shown) receives an instruction to output, it determines to output authoring data. When outputting, it progresses to step S118, and when not outputting, it returns to step S103.
(Step S103) The output unit 105 outputs the authoring data displayed on the display unit 106. For example, the output unit 105 may transmit the authoring data to the outside, and updates (eg, overwrites) the authoring data before the change stored in the authoring data storage unit 101 with the changed authoring data. Also good.

なお、図２のフローチャートにおいて、ステップＳ１１０でデフォルトの処理を行なわないようにしても良い。 In the flowchart of FIG. 2, the default process may not be performed in step S110.

また、図２のフローチャートのステップＳ１０６において特定する処理対象を、現在表示されているオブジェクトの少なくとも一部とする代わりに、表示部１０６が表示しているオブジェクトと同じオーサリングデータ（例えば、同じファイル等）に含まれる非表示のオブジェクトのうちの一部としても良く、非表示のオブジェクトの全てとしても良い。なお、処理対象を現在表示されているオブジェクトとするか、非表示のオブジェクトとするかは、例えば、ユーザが予め設定しておくようにしても良く、ユーザが処理を行なう際に指示するようにしてもよい。 In addition, instead of setting the processing target specified in step S106 in the flowchart of FIG. 2 as at least a part of the currently displayed object, the same authoring data as the object displayed on the display unit 106 (for example, the same file or the like) ) May be part of the non-display objects included, or may be all the non-display objects. Whether the processing target is the currently displayed object or the non-displayed object may be preset by the user, for example, or may be instructed when the user performs processing. May be.

また、図２のフローチャートのステップＳ１１４において、表示の有無にかかわらず、表示部１０６が表示しているオブジェクトと同じオーサリングデータ（例えば、同じファイル等）に含まれるすべてのオブジェクトを処理対象に特定してもよい。
また、図２のフローチャートのステップＳ１１５は省略してもよい。 Further, in step S114 of the flowchart of FIG. 2, all objects included in the same authoring data (for example, the same file, etc.) as the object displayed on the display unit 106 are specified as processing targets regardless of whether or not they are displayed. May be.
Further, step S115 in the flowchart of FIG. 2 may be omitted.

なお、図２のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 2, the process is terminated by powering off or a process termination interrupt.

以下、本実施の形態におけるオーサリング装置１の具体的な動作について説明する。 Hereinafter, a specific operation of the authoring apparatus 1 in the present embodiment will be described.

（具体例１）
図３は、本実施の形態のオーサリング装置１の一例を示す図であり、オーサリング装置１は、タブレット型端末であるとする。オーサリング装置１は、マイクロフォン１０２ａと、モニタ１０６ａとを備えているものとする。また、モニタ１０６ａの表面には、図示しないタッチパネルが設けられているものとする。 (Specific example 1)
FIG. 3 is a diagram illustrating an example of the authoring device 1 according to the present embodiment, and it is assumed that the authoring device 1 is a tablet terminal. The authoring device 1 is assumed to include a microphone 102a and a monitor 106a. It is assumed that a touch panel (not shown) is provided on the surface of the monitor 106a.

図４は、オーサリングデータ格納部１０１に格納されたオーサリングデータを管理するオーサリングデータ管理表である。オーサリングデータ管理表は、「オーサリングＩＤ」と、「オーサリングデータ」という属性を有している。「オーサリングＩＤ」は、オーサリングデータの識別子であり、ここでは、オーサリングデータのファイル名であるとする。「オーサリングデータ」は、オーサリングデータであり、ここでは、オーサリングデータのファイルであるとする。ここでは一例として、オーサリングデータが、電子書籍である場合について説明する。 FIG. 4 is an authoring data management table for managing authoring data stored in the authoring data storage unit 101. The authoring data management table has attributes of “authoring ID” and “authoring data”. “Authoring ID” is an identifier of authoring data, and here it is assumed to be a file name of authoring data. “Authoring data” is authoring data, and here, it is assumed that it is a file of authoring data. Here, as an example, a case where the authoring data is an electronic book will be described.

まず、ユーザがタッチパネル等を操作して、オーサリングデータ格納部１０１に格納されている、「オーサリングＩＤ」が「００１」であるオーサリングデータを表示する指示をオーサリング装置１の図示しない受付部等に与えたとすると、表示部１０６は、オーサリングＩＤ「００１」と対応づけられたオーサリングデータをオーサリングデータ格納部１０１からメモリ等に読出して、モニタ１０６ａにこのオーサリングデータを表示する。ここでは、表示部１０６は、デフォルト等の設定に従って、このオーサリングデータの１ページ目の左上の一部の領域を表示する。以下、「オーサリングＩＤ」が「００１」であるオーサリングデータをオーサリングデータ００１と称す。 First, the user operates a touch panel or the like to give an instruction to display the authoring data with the “authoring ID” “001” stored in the authoring data storage unit 101 to a reception unit (not shown) of the authoring device 1. For example, the display unit 106 reads out the authoring data associated with the authoring ID “001” from the authoring data storage unit 101 to a memory or the like, and displays the authoring data on the monitor 106a. Here, the display unit 106 displays a part of the upper left part of the first page of the authoring data according to a setting such as default. Hereinafter, authoring data whose “authoring ID” is “001” is referred to as authoring data 001.

図５は、オーサリングデータ００１のページ構成を示す模式図である。オーサリングデータ００１は、第１ページであるページ５１から第４ページであるページ５４までの合計４ページのページを有しているものとする。各ページには、画像オブジェクトと、テキストオブジェクトとが配置されている。第一ページであるページ５１には、テキストオブジェクト５０１および５０４と、画像オブジェクト５０２と５０３とが配置されている。 FIG. 5 is a schematic diagram showing a page structure of the authoring data 001. It is assumed that the authoring data 001 includes a total of four pages from the first page 51 to the fourth page 54. An image object and a text object are arranged on each page. On page 51, which is the first page, text objects 501 and 504 and image objects 502 and 503 are arranged.

図６は、表示部１０６によるオーサリングデータ００１の表示例を示す図である。モニタ１０６ａには、ページ５１の領域５１ａが表示されているものとする。具体的には、モニタ１０６ａには、ページ５１に配置されたテキストオブジェクト５０１の全体と、画像オブジェクト５０２の一部と、画像オブジェクト５０３の全体とが表示されている。 FIG. 6 is a diagram illustrating a display example of the authoring data 001 by the display unit 106. It is assumed that the area 51a of the page 51 is displayed on the monitor 106a. Specifically, the entire text object 501 arranged on the page 51, a part of the image object 502, and the entire image object 503 are displayed on the monitor 106a.

図７は、オーサリング処理部１０４が音声認識部１０３が音声認識結果として取得したテキストデータから処理対象を認識するために用いられる認識用文字列と、認識の結果として指定される処理対象との対応関係を管理する認識処理対象管理表である。認識処理対象管理表は、「対象文字列」と、「処理対象」という属性を有している。「対象文字列」は、処理対象を認識するための認識用文字列である。「処理対象」は、処理対象を示す情報であり、例えば、処理対象を検索するための属性とその属性値（属性：属性値）との組み合わせを示している。例えば、「データタイプ：画像」は、データタイプが画像であるオブジェクトを指定することを意味する。また、「オブジェクト名：ＡＢＣ」は、オブジェクト名が「ＡＢＣ」であるオブジェクトを指定することを示している。「対象文字列」の［＊＊］は、１文字以上の連続する文字列を示す正規表現であるとする。例えば、「ファイル［＊＊］」は、「ファイルＡＢＣ」や「ファイルＣＤＥ」等を示す。認識処理対象管理表は、例えば、図示しない格納部等に予め格納されている。 FIG. 7 shows the correspondence between the recognition character string used for recognizing the processing target from the text data obtained by the authoring processing unit 104 as the speech recognition result by the speech recognition unit 103 and the processing target specified as the recognition result. It is a recognition process target management table for managing the relationship. The recognition processing target management table has attributes of “target character string” and “processing target”. The “target character string” is a recognition character string for recognizing a processing target. “Processing target” is information indicating the processing target, and indicates, for example, a combination of an attribute for searching for the processing target and its attribute value (attribute: attribute value). For example, “data type: image” means that an object whose data type is an image is designated. “Object name: ABC” indicates that an object whose object name is “ABC” is designated. [**] of “target character string” is a regular expression indicating a continuous character string of one or more characters. For example, “file [**]” indicates “file ABC”, “file CDE”, and the like. The recognition process target management table is stored in advance in a storage unit (not shown), for example.

なお、ここでは、「処理対象」の属性値は、自然言語で表しているが、「処理対象」は、例えば、これらの自然言語に相当する処理対象を特定するために用いられる一以上の属性名や、属性値等で構成されていてもよい。また、「処理対象」は、これらの自然言語に対応する検索処理や判断処理等に用いられる検索キー等のデータであっても良い。 Here, the attribute value of “processing target” is expressed in natural language, but “processing target” is, for example, one or more attributes used to identify processing targets corresponding to these natural languages. It may be composed of a name, an attribute value, and the like. Further, the “processing target” may be data such as a search key used for search processing or determination processing corresponding to these natural languages.

図８は、オーサリング処理部１０４が音声認識部１０３が音声認識結果として取得したテキストデータから実行する処理を認識するために用いられる認識用文字列と、認識の結果として指定される処理との対応関係を管理する認識処理管理表である。認識処理管理表は、「処理文字列」と、「処理」という属性を有している。「処理文字列」は、実行する処理を認識するための認識用文字列である。「処理」は、実行する処理を示す情報である。「処理」の［被処理対象］は、後述する被処理対象管理表を用いて取得される被処理対象の文字列である。認識処理管理表は、例えば、図示しない格納部等に予め格納されている。 FIG. 8 shows correspondence between a recognition character string used for recognizing processing executed by the authoring processing unit 104 from text data acquired by the speech recognition unit 103 as a speech recognition result, and processing specified as a recognition result. It is a recognition process management table for managing relationships. The recognition process management table has attributes of “process character string” and “process”. The “process character string” is a recognition character string for recognizing the process to be executed. “Processing” is information indicating processing to be executed. [Processing target] of “Processing” is a character string of the processing target acquired using a processing target management table described later. The recognition process management table is stored in advance in a storage unit (not shown), for example.

なお、ここでは、「処理」の属性値は、自然言語で表しているが、「処理」は、例えば、これらの自然言語に相当する一以上の関数や、メソッド名や、「ｉｆ」、「ｔｈｅｎ」等で示される制御構文等で構成されていてもよい。また、「処理対象」は、これらの自然言語に対応する判断処理等を行うためのアルゴリズムであっても良い。 Here, the attribute value of “processing” is expressed in a natural language, but “processing” is, for example, one or more functions corresponding to these natural languages, method names, “if”, “ It may be configured with a control syntax indicated by “then” or the like. Further, the “processing target” may be an algorithm for performing a determination process or the like corresponding to these natural languages.

図９は、オーサリング処理部１０４が音声認識部１０３が音声認識結果として取得したテキストデータから被処理対象を認識するために用いられる認識用文字列を管理する被処理対象管理表である。被処理対象管理表は、「被処理対象」は、被処理対象を認識するための認識用文字列である。「被処理対象」において、［＊＊＊］は、１文字以上の連続する文字列を示す正規表現であるとする。例えば、「レイヤ［＊＊＊］」は、「レイヤＡＢＣ」や、「レイヤＤＥＦ」等を示す。 FIG. 9 is a processing target management table for managing a recognition character string used by the authoring processing unit 104 to recognize a processing target from text data acquired by the voice recognition unit 103 as a voice recognition result. In the process target management table, “process target” is a recognition character string for recognizing the process target. In “processing target”, it is assumed that [***] is a regular expression indicating a continuous character string of one or more characters. For example, “layer [***]” indicates “layer ABC”, “layer DEF”, or the like.

次に、ユーザが、図６に表示されている画像のサイズを拡大するために、マイクロフォン１０２ａに対して「画像を大きく」という音声を発したとする。この音声は、「画像」という処理対象を示す音声と、「大きく」という処理を示す音声とを有する音声であるとする。音声受付部１０２は、マイクロフォン１０２ａを介して、ユーザが発した音声を音声信号として受け付け、標本化を行なって音声データに変換する。 Next, it is assumed that the user utters a voice “enlarge image” to the microphone 102a in order to enlarge the size of the image displayed in FIG. This sound is assumed to be a sound having a sound indicating a processing target “image” and a sound indicating a process “large”. The voice reception unit 102 receives voice uttered by the user as a voice signal via the microphone 102a, performs sampling, and converts the voice into voice data.

そして、音声認識部１０３が、音声受付部１０２が取得した音声データに対して音声認識を行なった結果、「画像を大きく」というテキストデータを取得したとする。 Then, it is assumed that the voice recognition unit 103 acquires text data “Large image” as a result of performing voice recognition on the voice data acquired by the voice receiving unit 102.

オーサリング処理部１０４は、図７に示した認識処理対象管理表の各レコード（行）から、処理対象を表す認識用文字列である「対象文字列」の属性値を順次読出し、読出した属性値と一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるか否かを順次判断する。そして、含まれると判断された属性値に対応するレコードの「処理対象」の属性値、つまり、含まれると判断された属性値と同じレコードの「処理対象」の属性値を取得する。 The authoring processing unit 104 sequentially reads out the attribute value of “target character string”, which is a recognition character string representing the processing target, from each record (row) of the recognition processing target management table shown in FIG. It is sequentially determined whether or not the character string that matches is included in the text data acquired by the speech recognition unit 103. Then, the “processing target” attribute value of the record corresponding to the attribute value determined to be included, that is, the “processing target” attribute value of the same record as the attribute value determined to be included is acquired.

ここでは、図７の上から一行目のレコードの「対象文字列」である「画像」と一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるため、このレコード「処理対象」の値である「データタイプ：画像」を処理対象を示す情報として取得する。 Here, since the text data acquired by the speech recognition unit 103 includes a character string that matches the “image” that is the “target character string” of the record in the first line from the top of FIG. The value of “data type: image” is acquired as information indicating the processing target.

オーサリング処理部１０４は、つぎに、図８に示した認識処理管理表の各レコード（行）から、実行する処理を表す認識用文字列である「処理文字列」の属性値を順次読出し、読出した属性値と一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるか否かを順次判断する。そして、含まれると判断された属性値に対応するレコードの「処理」の属性値、つまり、含まれると判断された属性値と同じレコードの「処理」の属性値を取得する。 Next, the authoring processing unit 104 sequentially reads and reads the attribute value of “processing character string”, which is a recognition character string indicating processing to be executed, from each record (row) of the recognition processing management table shown in FIG. It is sequentially determined whether or not a character string that matches the attribute value is included in the text data acquired by the speech recognition unit 103. Then, the “processing” attribute value of the record corresponding to the attribute value determined to be included, that is, the “processing” attribute value of the same record as the attribute value determined to be included is acquired.

ここでは、図７の上から一行目のレコードの「処理文字列」である「大きく」と一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるため、このレコード「処理」の値である「１２０％拡大」を取得する。この「１２０％拡大」は、オブジェクトを１２０％拡大する処理の実行を指示する情報であるとする。 Here, since the text data acquired by the speech recognition unit 103 includes a character string that matches “large” as the “processing character string” of the record in the first line from the top of FIG. The value “120% expansion” is acquired. This “120% enlargement” is information instructing execution of a process for enlarging an object by 120%.

さらに、オーサリング処理部１０４は、図９に示した被処理対象管理表の「被処理対象」の属性値を順次読出し、読出した属性値と一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるか否かを順次判断する。そして、含まれると判断された属性値を取得する。例えば、テキストデータには、「レイヤ」や、「ページ」という文字列が含まれないため、図９に示した被処理対象管理表のいずれのレコードの「被処理対象」の属性値も、一致するものがなかったとする。なお、一致するものがあるか否かの判断において、「被処理対象」の属性値に含まれる正規表現の開始位置や、終了位置等は、形態素解析等を用いてテキストデータから取得した形態素の区切り等から判断してもよい。 Further, the authoring processing unit 104 sequentially reads the attribute value of “processing target” in the processing target management table shown in FIG. 9, and the character string that matches the read attribute value is the text acquired by the speech recognition unit 103. It is sequentially determined whether or not it is included in the data. Then, the attribute value determined to be included is acquired. For example, since the text data does not include the character strings “layer” and “page”, the attribute value of “target” in any record of the target management table shown in FIG. Suppose there was nothing to do. In determining whether there is a match, the start position, end position, etc. of the regular expression included in the attribute value of “target” are the morpheme obtained from the text data using morphological analysis etc. You may judge from a break.

上述したように、音声認識部１０３が取得したテキストデータから、処理対象を示す情報が取得され、処理を示す情報が取得され、被処理対象を示す情報が取得できなかったため、オーサリング処理部１０４は、現在表示されているオブジェクトのうちの、処理対象を示す情報「データタイプ：画像」が示すオブジェクトであって、全体がモニタ１０６ａに表示されているオブジェクトに対して、処理を示す情報「１２０％拡大」が示すオーサリング処理を行なう。 As described above, since the information indicating the processing target is acquired from the text data acquired by the voice recognition unit 103, the information indicating the processing is acquired, and the information indicating the processing target cannot be acquired, the authoring processing unit 104 Among the currently displayed objects, the information “120%” indicating the processing is the object indicated by the information “data type: image” indicating the processing target and the object displayed as a whole on the monitor 106a. The authoring process indicated by “Enlarge” is performed.

具体的には、現在表示されているオブジェクトのうちの、データタイプが画像であるオブジェクトであるオブジェクト５０２およびオブジェクト５０３を検出する。さらに、検出したオブジェクト５０２およびオブジェクト５０３の中から、オブジェクト全体がモニタ１０６ａに表示されていないオブジェクトを検出する。例えばオブジェクト内の座標の一部が、表示領域内に位置していないオブジェクトを検出する。ここでは、オブジェクト５０２を検出して、このオブジェクト５０２を、処理対象から除外する。そして、オーサリング処理部１０４は、処理対象として残ったオブジェクト５０２に対して、サイズ（大きさ）を１２０％拡大する処理を行なう。そして、表示部１０６は、このようにして取得されたオーサリングデータにより、モニタ１０６ａの表示を更新する。 Specifically, among the currently displayed objects, an object 502 and an object 503 that are objects whose data type is an image are detected. Furthermore, an object whose entire object is not displayed on the monitor 106a is detected from the detected objects 502 and 503. For example, an object in which some of the coordinates in the object are not located in the display area is detected. Here, the object 502 is detected, and the object 502 is excluded from the processing target. Then, the authoring processing unit 104 performs a process of enlarging the size (size) by 120% for the object 502 remaining as a processing target. Then, the display unit 106 updates the display on the monitor 106a with the authoring data acquired in this way.

図１０は、オーサリング処理後に表示部１０６が表示したオーサリングデータを示す図である。 FIG. 10 is a diagram showing authoring data displayed on the display unit 106 after the authoring process.

なお、拡大処理の対象がテキストオブジェクトである場合、テキストが配置される領域を拡大するようにしても良く、テキストの文字サイズを変更するようにしても良い。 When the target of the enlargement process is a text object, the area where the text is arranged may be enlarged, or the character size of the text may be changed.

また、ここでは、オブジェクト全体がモニタ１０６ａに表示されていないオブジェクトを処理対象から除外するようにしたが、除外しないようにしてもよい。例えば、除外するか否かをユーザの指示等に応じて切り替えられるようにしてもよい。 In addition, here, an object whose entire object is not displayed on the monitor 106a is excluded from the processing target, but may be excluded. For example, whether or not to exclude may be switched according to a user instruction or the like.

また、ここでは、音声認識部１０３が取得したテキストデータに、「大きく」という文字列が含まれている場合について説明したが、例えば、仮に、このテキストデータに「大きく」の代わりに「もっと明るく」という文字列が含まれていた場合、オーサリング処理部１０４は、このテキストデータについて上記と同様の処理を行なって、図８に示した認識処理管理表から、「明度を１０％上げる」という処理を示す情報を取得する。これにより、オーサリング処理部１０４が実行する処理が、画像オブジェクトの明度を１０％上げることに特定される。そして、オーサリング処理部１０４が、この処理を、上記で特定された現在表示されている画像オブジェクトに対して行なうことで、現在表示されている画像オブジェクトの明度の値が１０％上げられて、表示される画像の明るさが明るくなる。 Here, a case has been described in which the text data acquired by the speech recognition unit 103 includes the character string “large”. For example, instead of “large”, the text data is “brighter”. ”Is included, the authoring processing unit 104 performs a process similar to the above on the text data, and performs a process of“ increase brightness by 10% ”from the recognition process management table shown in FIG. Get information indicating Thereby, the process executed by the authoring processing unit 104 is specified to increase the brightness of the image object by 10%. Then, the authoring processing unit 104 performs this process on the currently displayed image object specified above, thereby increasing the brightness value of the currently displayed image object by 10% and displaying the image object. The brightness of the displayed image becomes brighter.

また、仮に、図６に示した状態において、音声受付部１０２がユーザから受け付けた音声が、「画像を４ページにコピー」という音声であったとし、音声認識部１０３が音声認識結果として「画像を第４ページにコピー」というテキストデータを取得したとする。 Further, in the state shown in FIG. 6, it is assumed that the voice received from the user by the voice receiving unit 102 is a voice “copy image to page 4”, and the voice recognition unit 103 sets “image” It is assumed that the text data “copy to page 4” is acquired.

オーサリング処理部１０４は、音声認識結果として得られたテキストデータから、上記と同様の処理を行なって、処理対象を示す情報である「データタイプ：画像」と、処理を示す情報である「［被処理対象］にコピー」と、被処理対象を示す「第４ページ」とを取得する。例えば、上記のテキストデータは、「第４ページ」というテキストを含んでおり、このテキストが、図９に示した被処理対象管理表の上から二行目のレコードの「［＊＊＊］ページ」という正規表現を有する文字列と一致するため、オーサリング処理部１０４は、テキストデータから、この文字列に一致するテキストである「第４ページ」を，被処理対象を示す情報として取得する。 The authoring processing unit 104 performs the same processing as described above from the text data obtained as the speech recognition result, and “data type: image” which is information indicating the processing target and “[ “Copy to Process Target” and “Fourth Page” indicating the target to be processed are acquired. For example, the above text data includes the text “fourth page”, and this text is the “[***] page of the second line record from the top of the processing target management table shown in FIG. Therefore, the authoring processing unit 104 acquires, from the text data, “fourth page”, which is the text that matches the character string, as information indicating the processing target.

そして、オーサリング処理部１０４は、現在表示されているオブジェクトのうちの、データタイプが画像であるオブジェクトであるオブジェクト５０２およびオブジェクト５０３を第４ページにコピーする処理、具体的にはオブジェクト５０２およびオブジェクト５０３を複製して第４ページに配置する処理を行なう。ここでは、第１ページであるページ５１のオブジェクト５０２およびオブジェクト５０３の複製であるオブジェクト５０２ａおよび５０３ａを、第４ページであるページ５４の同じ位置（例えば同じ座標）に配置する。なお、ページ５１〜ページ５４には、例えば、「第１ページ」〜「第４ページ」という識別子をそれぞれ予め対応づけておくことで、コピー元のページと、コピーしたオブジェクトの配置先となるページを検出可能である。 The authoring processing unit 104 then copies the object 502 and the object 503, which are objects whose data type is an image, among the currently displayed objects to the fourth page, specifically, the object 502 and the object 503. Is copied and placed on the fourth page. Here, the object 502a and 503a that are duplicates of the object 502 and the object 503 of the page 51 that are the first page are arranged at the same position (for example, the same coordinates) of the page 54 that is the fourth page. Note that the pages 51 to 54 are associated with identifiers “first page” to “fourth page” in advance, for example, so that the page to be copied and the page to which the copied object is placed are placed. Can be detected.

これにより、「オーサリングＩＤ」が「００１」であるオーサリングデータ００１は、図１１に示すように変更される。 As a result, the authoring data 001 whose “authoring ID” is “001” is changed as shown in FIG.

また、仮に、図６に示した状態において、ユーザが、マイクロフォン１０２ａに対して「もっと右」という音声を発したとする。この音声は、右へ移動させることを指示する音声であるとする。音声受付部１０２は、マイクロフォン１０２ａを介して、ユーザが発した音声を音声信号として受け付け、標本化を行なって音声データに変換する。そして、音声認識部１０３が、音声認識を行なって、「もっと右」というテキストデータを取得したとする。 Also, suppose that the user utters “more right” to the microphone 102a in the state shown in FIG. It is assumed that this voice is a voice that instructs to move to the right. The voice reception unit 102 receives voice uttered by the user as a voice signal via the microphone 102a, performs sampling, and converts the voice into voice data. Then, it is assumed that the voice recognition unit 103 performs voice recognition and acquires text data “more right”.

オーサリング処理部１０４は、上記と同様に、図７に示した認識処理対象管理表の「対象文字列」の属性値を順次読出して一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるか否かを順次判断するが、ここでは、いずれも含まれないと判断されたとする。つまり、オーサリング処理部１０４が、音声認識結果から、処理対象を特定できなかったとする。 Similarly to the above, the authoring processing unit 104 sequentially reads the attribute values of the “target character string” in the recognition processing target management table shown in FIG. 7 and the matching character string becomes the text data acquired by the speech recognition unit 103. Whether it is included or not is sequentially determined, but here it is determined that none is included. That is, it is assumed that the authoring processing unit 104 cannot identify the processing target from the voice recognition result.

このため、オーサリング処理部１０４は、上記と同様に、図８に示した認識処理管理表の各レコード（行）から、「処理文字列」の属性値を順次読出し、一致する文字列が、音声認識部１０３が取得したテキストデータに含まれると判断された場合に、含まれると判断された属性値に対応するレコードの「処理」の属性値を取得する。 For this reason, the authoring processing unit 104 sequentially reads the attribute value of “processed character string” from each record (row) of the recognition process management table shown in FIG. When it is determined that the recognition unit 103 includes the acquired text data, the “processing” attribute value of the record corresponding to the attribute value determined to be included is acquired.

ここでは、図７の上から六行目のレコードの「処理文字列」である「もっと右」と一致する文字列が取得したテキストデータに含まれるため、このレコードの「処理」の値である「右へ１０ピクセル移動」を取得する。この処理は、オブジェクトを右に１０ピクセル移動させる処理であり、この処理が、処理対象に対して行なう処理として特定された処理である。 Here, since the acquired text data includes a character string that matches “more right”, which is the “processing character string” of the record on the sixth line from the top in FIG. 7, this is the value of “processing” of this record. Acquire “move 10 pixels to the right”. This process is a process for moving the object to the right by 10 pixels, and this process is specified as a process to be performed on the processing target.

さらに、オーサリング処理部１０４は、音声認識部１０３が取得したテキストデータに、レイヤを指定する情報が含まれるか否かを判断する。ここでは、具体的に、オーサリング処理部１０４は、取得したテキストデータに、図９に示した被処理対象管理表の上から一行目の属性値と同様の、「レイヤ［＊＊＊］」という正規表現を含む文字列と一致する文字列が含まれるか否かを判断する。ここでは、一致する文字列が含まれないと判断される。 Further, the authoring processing unit 104 determines whether or not the text data acquired by the speech recognition unit 103 includes information specifying a layer. Here, specifically, the authoring processing unit 104 refers to the acquired text data as “layer [***]”, which is the same as the attribute value in the first row from the processing target management table shown in FIG. It is determined whether or not a character string that matches the character string including the regular expression is included. Here, it is determined that no matching character string is included.

このため、オーサリング処理部１０４は、現在、表示部１０６がモニタ１０６ａに表示している全てのオブジェクトであるオブジェクト５０１〜５０３を検出して，処理対象に特定する。 For this reason, the authoring processing unit 104 detects the objects 501 to 503 that are all objects currently displayed on the monitor 106a by the display unit 106 and identifies them as processing targets.

なお、ここでも、上記と同様に、一部が表示されていないオブジェクトは、処理対象から除外するようにしても良い。また、現在表示されている全てのオブジェクトの代わりに、オーサリング処理部１０４は、現在少なくとも一部が表示されているページであるページ５１に配置されているオブジェクトを処理対象として特定するようにしても良い。 In this case, as described above, an object that is not partially displayed may be excluded from the processing target. Further, instead of all the objects currently displayed, the authoring processing unit 104 may specify an object arranged on the page 51 that is a page on which at least a part is currently displayed as a processing target. good.

そして、オーサリング処理部１０４は、上記で特定したオブジェクト５０１〜５０３に対して、それぞれ、上記で特定した処理であるオブジェクトの位置、ここでは座標を、右へ１０ピクセル移動させる処理を行なう。そして、表示部１０６は、このようにして取得されたオーサリングデータにより、モニタ１０６ａの表示を更新する。 Then, the authoring processing unit 104 performs a process of moving the position of the object, which is the process specified above, here, the coordinates, to the right by 10 pixels with respect to the objects 501 to 503 specified above. Then, the display unit 106 updates the display on the monitor 106a with the authoring data acquired in this way.

図１２は、オーサリング処理後に表示部１０６が表示したオーサリングデータを示す図である。 FIG. 12 is a diagram showing the authoring data displayed on the display unit 106 after the authoring process.

図１３は、ページ５１が複数のレイヤを有している場合を説明するための模式図である。ここで、例えば、図５に示したオーサリングデータ００１の第１ページであるページ５１が、図１３に示すように、レイヤ５１ａと、レイヤ５１ｂとで構成されていたとする。レイヤ５１ａには、オブジェクト５０１およびオブジェクト５０２が配置され、レイヤ５１ｂには、オブジェクト５０３およびオブジェクト５０４が配置されていたとする。また、レイヤ５１ａは、「レイヤＬＦ」というレイヤ名が対応づけられており、レイヤ５１ｂには、「レイヤＬＢ」というレイヤ名が対応づけられていたとする。 FIG. 13 is a schematic diagram for explaining a case where the page 51 has a plurality of layers. Here, for example, it is assumed that the page 51, which is the first page of the authoring data 001 shown in FIG. 5, is composed of a layer 51a and a layer 51b as shown in FIG. Assume that an object 501 and an object 502 are arranged on the layer 51a, and an object 503 and an object 504 are arranged on the layer 51b. In addition, it is assumed that the layer name “layer LF” is associated with the layer 51a, and the layer name “layer LB” is associated with the layer 51b.

このような場合において、仮に、図６のように、ページ５１を表示した状態で、ユーザが、マイクロフォン１０２ａに対して「レイヤＬＦを大きく」という音声を発したとする。この音声は、「レイヤＬＦ」という被処理対象を示す音声と、「大きく」という処理を示す音声とを有する音声であるとする。音声受付部１０２は、マイクロフォン１０２ａを介して、ユーザが発した音声を音声信号として受け付け、標本化を行なって音声データに変換する。そして、音声認識部１０３が、音声認識を行なって、「レイヤＬＦを大きく」というテキストデータを取得したとする。 In such a case, it is assumed that the user utters a voice “Large layer LF” to the microphone 102a with the page 51 displayed as shown in FIG. This sound is assumed to be a sound having a sound indicating the processing target “layer LF” and a sound indicating the process “large”. The voice reception unit 102 receives voice uttered by the user as a voice signal via the microphone 102a, performs sampling, and converts the voice into voice data. Then, it is assumed that the voice recognition unit 103 performs voice recognition and acquires text data “increase layer LF”.

オーサリング処理部１０４は、上記と同様に、図７に示した認識処理対象管理表の「対象文字列」の属性値を順次読出して一致する文字列がこのテキストデータに含まれるか否かを順次判断するが、ここでは、いずれも含まれないと判断されたとする。 Similarly to the above, the authoring processing unit 104 sequentially reads the attribute value of “target character string” in the recognition processing target management table shown in FIG. 7 and sequentially determines whether or not a matching character string is included in this text data. Assume that it is determined that none of them is included here.

このため、オーサリング処理部１０４は、上記と同様に、図８に示した認識処理管理表の各レコード（行）から、「処理文字列」の属性値を順次読出し、一致する文字列がこのテキストデータに含まれるか否かを判断して、上記と同様に、処理対象に対して行なう処理を特定する情報として、「１２０％拡大」を取得する。 Therefore, the authoring processing unit 104 sequentially reads the attribute value of “processed character string” from each record (row) of the recognition processing management table shown in FIG. It is determined whether or not the data is included in the data, and “120% enlargement” is acquired as information for specifying the processing to be performed on the processing target in the same manner as described above.

さらに、オーサリング処理部１０４は、音声認識部１０３が取得したテキストデータに、「レイヤ［＊＊＊］」と一致する文字列が含まれるか否かを判断する。ここでは、「レイヤＬＦ」がこの文字列と一致するため、一致する文字列が含まれると判断される。 Further, the authoring processing unit 104 determines whether or not the text data acquired by the speech recognition unit 103 includes a character string that matches “layer [***]”. Here, since “layer LF” matches this character string, it is determined that a matching character string is included.

このため、オーサリング処理部１０４は、現在、表示部１０６がモニタ１０６ａに一部が表示しているページ５１の、レイヤ名が「レイヤＬＦ」であるレイヤ５１ａに配置されている全てのオブジェクトであるオブジェクト５０１及びオブジェクト５０２を検出して，処理対象に特定する。 Therefore, the authoring processing unit 104 is all the objects arranged in the layer 51a whose layer name is “Layer LF” in the page 51 that is partly displayed on the monitor 106a by the display unit 106 at present. The objects 501 and 502 are detected and specified as processing targets.

なお、ここでも、上記と同様に、一部が表示されていないオブジェクトは、処理対象から除外するようにしても良い。 In this case, as described above, an object that is not partially displayed may be excluded from the processing target.

そして、オーサリング処理部１０４は、上記で特定したレイヤ５１ａに配置されているオブジェクト５０１及びオブジェクト５０２に対して、それぞれ、上記で特定した処理であるオブジェクトのサイズを１２０％拡大する処理を行なう。そして、表示部１０６は、このようにして取得されたオーサリングデータにより、モニタ１０６ａの表示を更新する。これにより、図５に示したオーサリングデータ００１のオブジェクト５０１および５０２を１２０％拡大したオーサリングデータを取得して、表示することができる。 Then, the authoring processing unit 104 performs a process of enlarging the size of the object, which is the process specified above, by 120% on the object 501 and the object 502 arranged on the layer 51a specified above. Then, the display unit 106 updates the display on the monitor 106a with the authoring data acquired in this way. Accordingly, authoring data obtained by enlarging the objects 501 and 502 of the authoring data 001 shown in FIG. 5 by 120% can be acquired and displayed.

（具体例２）
上記具体例においては、音声認識部１０３による音声認識結果から、処理対象のオブジェクトを特定できない場合に、表示部１０６がモニタ１０６ａに表示しているオブジェクトや、表示部１０６が表示しているページに配置されたオブジェクトに対して処理を行なう例について説明したが、以下においては、オーサリング処理部１０４が、オーサリングデータに配置されたオブジェクトのうちの、現在表示されているオブジェクト以外のオブジェクトに配置された全てのオブジェクトに対して処理を行なう例について説明する。 (Specific example 2)
In the above specific example, when the object to be processed cannot be specified from the voice recognition result by the voice recognition unit 103, the object displayed on the monitor 106a by the display unit 106 or the page displayed by the display unit 106 is displayed. Although an example in which processing is performed on a placed object has been described, in the following, the authoring processing unit 104 is placed on an object other than the currently displayed object among the objects placed on the authoring data. An example in which processing is performed on all objects will be described.

例えば、上記具体例と同様にオーサリングデータ００１をモニタ１０６ａに表示した状態で、ユーザが、マイクロフォン１０２ａに対して「大きく」という音声を発したとする。この音声は、「大きく」という処理を示す音声とを有する音声であるとする。音声受付部１０２は、マイクロフォン１０２ａを介して、ユーザが発した音声を音声信号として受け付け、標本化を行なって音声データに変換する。そして、音声認識部１０３が、音声認識を行なって、「大きく」というテキストデータを取得したとする。 For example, it is assumed that the user utters “loud” to the microphone 102a in a state where the authoring data 001 is displayed on the monitor 106a as in the above specific example. This sound is assumed to be a sound having a sound indicating a process of “large”. The voice reception unit 102 receives voice uttered by the user as a voice signal via the microphone 102a, performs sampling, and converts the voice into voice data. Then, it is assumed that the voice recognition unit 103 performs voice recognition and acquires text data “large”.

このため、オーサリング処理部１０４は、上記と同様に、図８に示した認識処理管理表の各レコード（行）から、「処理文字列」の属性値を順次読出し、一致する文字列が、音声認識部１０３が取得したテキストデータに含まれるか否かを判断する。そして、含まれると判断した場合に、含まれると判断した属性値に対応するレコードの「処理」の属性値を取得する。 For this reason, the authoring processing unit 104 sequentially reads the attribute value of “processed character string” from each record (row) of the recognition process management table shown in FIG. The recognizing unit 103 determines whether it is included in the acquired text data. When it is determined that the attribute value is included, the attribute value of “processing” of the record corresponding to the attribute value determined to be included is acquired.

ここでは、図７の上から一行目のレコードの「処理文字列」である「大きく」と一致する文字列が取得したテキストデータに含まれるため、このレコードの「処理」の値である「１２０％拡大」を取得する。この処理が、処理対象に対して行なう処理として特定された処理である。 Here, since the acquired text data includes a character string that matches “large” that is the “processing character string” of the record in the first line from the top of FIG. 7, the “processing” value of this record is “120”. % Expansion ”. This process is a process specified as a process to be performed on the processing target.

さらに、オーサリング処理部１０４は、音声認識部１０３が取得したテキストデータに、レイヤを指定する情報が含まれるか否かを判断する。ここでは、具体的に、オーサリング処理部１０４は、取得したテキストデータに、図９に示した被処理対象管理表の上から一行目の属性値と同様の、「レイヤ［＊＊＊］」という正規表現を含む文字列と一致する文字列が含まれるか否かを判断する。ここでは、一致する文字列が含まれないと判断されたとする。 Further, the authoring processing unit 104 determines whether or not the text data acquired by the speech recognition unit 103 includes information specifying a layer. Here, specifically, the authoring processing unit 104 refers to the acquired text data as “layer [***]”, which is the same as the attribute value in the first row from the processing target management table shown in FIG. It is determined whether or not a character string that matches the character string including the regular expression is included. Here, it is assumed that a matching character string is not included.

このように、音声認識結果から処理対象のオブジェクトを特定できず、レイヤも検出されないため、オーサリング処理部１０４は、オーサリングデータ００１に配置されているオブジェクトのうちの、現在、表示部１０６がモニタ１０６ａに表示しているオブジェクトであるオブジェクト５０１〜５０３以外のオブジェクト、つまり現在表示されていない全てのオブジェクトを検出して、処理対象に特定する。 As described above, since the object to be processed cannot be specified from the speech recognition result, and the layer is not detected, the authoring processing unit 104 has the display unit 106 that currently displays the monitor 106a among the objects arranged in the authoring data 001. Objects other than the objects 501 to 503 that are displayed on the screen, that is, all objects not currently displayed are detected and specified as processing targets.

なお、オブジェクト５０２のように、オブジェクトの一部しか表示されていないオブジェクトは、現在表示されていないオブジェクトと考えて、処理対象に加えるようにしても良い。また、現在表示されている全てのオブジェクトの代わりに、オーサリング処理部１０４は、現在少なくとも一部が表示されているページであるページ５１に配置されている全てのオブジェクト（ここでは、例えば、オブジェクト５０１〜５０４）を処理対象から除外するようにしても良い。 Note that an object that is only partially displayed, such as the object 502, may be considered as an object that is not currently displayed and added to the processing target. Further, instead of all the objects currently displayed, the authoring processing unit 104 displays all the objects (here, for example, the object 501) arranged on the page 51 that is a page on which at least a part is currently displayed. ˜504) may be excluded from the processing target.

そして、オーサリング処理部１０４は、オーサリングデータ００１の上記で特定したオブジェクト５０１〜５０３以外の全てのオブジェクトに対して、それぞれ、上記で特定した処理であるオブジェクトのサイズを１２０％拡大する処理を行なう。そして、表示部１０６は、このようにして取得されたオーサリングデータにより、モニタ１０６ａの表示を更新する。 Then, the authoring processing unit 104 performs a process of enlarging the size of the object, which is the process specified above, by 120% for all the objects other than the objects 501 to 503 specified above in the authoring data 001. Then, the display unit 106 updates the display on the monitor 106a with the authoring data acquired in this way.

図１４は、オーサリング処理後により変更されたオーサリングデータ００１のページ構成を示す模式図である。図１４に示すように、非表示のオブジェクト、つまり、ページ５２〜５４に配置されたオブジェクトおよびオブジェクト５０４のサイズが、図５に対して、いずれも１２０％拡大されている。 FIG. 14 is a schematic diagram showing the page structure of the authoring data 001 changed after the authoring process. As shown in FIG. 14, the sizes of the non-displayed objects, that is, the objects arranged on the pages 52 to 54 and the object 504 are both enlarged by 120% compared to FIG. 5.

なお、非表示のオブジェクトを処理対象とする処理は、上記以外の処理についても適宜適用してもよいことはいうまでもない。 Needless to say, the process for processing a non-display object may be applied as appropriate to processes other than those described above.

以上、本実施の形態によれば、音声を用いてオブジェクトについてオーサリング処理を行なうことができる。これにより、例えば、両手がふさがっている場合においても、容易にオーサリング処理を行なうことができる。 As described above, according to the present embodiment, it is possible to perform authoring processing on an object using sound. Thereby, for example, even when both hands are occupied, the authoring process can be easily performed.

なお、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.

また、上記各実施の形態では、オーサリング装置がスタンドアロンである場合について説明したが、オーサリング装置は、スタンドアロンの装置であってもよく、サーバ・クライアントシステムにおけるサーバ装置であってもよい。後者の場合には、受付部や出力部や表示部等は、通信回線を介して入力を受け付けたり、画面を表示するためのデータ等を出力したりすることになる。 Further, although cases have been described with the above embodiments where the authoring device is a stand-alone device, the authoring device may be a stand-alone device or a server device in a server / client system. In the latter case, the accepting unit, the output unit, the display unit, etc. accept input via the communication line or output data for displaying the screen.

また、上記各実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、あるいは、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。その実行時に、プログラム実行部は、格納部（例えば、ハードディスクやメモリ等の記録媒体）にアクセスしながらプログラムを実行してもよい。 In each of the above embodiments, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. At the time of execution, the program execution unit may execute the program while accessing a storage unit (for example, a recording medium such as a hard disk or a memory).

なお、上記各実施の形態におけるオーサリング装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、１以上のオブジェクトが配置されたデータであるオーサリングデータが格納されるオーサリングデータ格納部にアクセス可能なコンピュータを、音声を受け付ける音声受付部と、音声受付部が受け付けた音声について音声認識処理を行なう音声認識部と、音声認識部の音声認識処理の結果に応じて、オーサリングデータ格納部に格納されているオーサリングデータに配置されたオブジェクトについてオーサリング処理を行なうオーサリング処理部とオーサリング処理部の処理結果を出力する出力部として機能させるためのプログラムである。 The software that realizes the authoring device in each of the above embodiments is the following program. In other words, this program uses a computer that can access an authoring data storage unit that stores authoring data, which is data in which one or more objects are arranged, for a voice reception unit that receives voice and a voice that is received by the voice reception unit. A speech recognition unit that performs speech recognition processing, and an authoring processing unit that performs authoring processing on an object placed in authoring data stored in the authoring data storage unit according to a result of speech recognition processing by the speech recognition unit and authoring processing It is a program for functioning as an output part which outputs the processing result of a part.

なお、上記プログラムにおいて、上記プログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を取得する取得部や、情報を出力する出力部などにおけるモデムやインターフェースカードなどのハードウェアでしか実現できない機能は、上記プログラムが実現する機能には含まれない。 In the program, the functions realized by the program do not include functions that can be realized only by hardware. For example, a function that can be realized only by hardware such as a modem or an interface card in an acquisition unit that acquires information or an output unit that outputs information is not included in the function realized by the program.

また、このプログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes this program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

図１５は、上記プログラムを実行して、上記実施の形態によるオーサリング装置を実現するコンピュータの外観の一例を示す模式図である。上記実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムによって実現されうる。 FIG. 15 is a schematic diagram illustrating an example of an external appearance of a computer that executes the program and realizes the authoring apparatus according to the embodiment. The above-described embodiment can be realized by computer hardware and a computer program executed on the computer hardware.

図１５において、コンピュータシステム９００は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブ９０５を含むコンピュータ９０１と、キーボード９０２と、マウス９０３と、モニタ９０４とを備える。 In FIG. 15, the computer system 900 includes a computer 901 including a CD-ROM (Compact Disk Read Only Memory) drive 905, a keyboard 902, a mouse 903, and a monitor 904.

図１６は、コンピュータシステム９００の内部構成を示す図である。図１６において、コンピュータ９０１は、ＣＤ−ＲＯＭドライブ９０５に加えて、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ９１２と、ＭＰＵ９１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１３と、アプリケーションプログラム、システムプログラム、及びデータを記憶するハードディスク９１４と、ＭＰＵ９１１、ＲＯＭ９１２等を相互に接続するバス９１５とを備える。なお、コンピュータ９０１は、ＬＡＮへの接続を提供する図示しないネットワークカードを含んでいてもよい。 FIG. 16 is a diagram showing an internal configuration of the computer system 900. In FIG. 16, in addition to the CD-ROM drive 905, a computer 901 is connected to an MPU (Micro Processing Unit) 911, a ROM 912 for storing a program such as a boot-up program, and an MPU 911. A RAM (Random Access Memory) 913 that temporarily stores and provides a temporary storage space, a hard disk 914 that stores application programs, system programs, and data, and a bus 915 that interconnects the MPU 911, ROM 912, and the like Prepare. The computer 901 may include a network card (not shown) that provides connection to the LAN.

コンピュータシステム９００に、上記実施の形態によるオーサリング装置等の機能を実行させるプログラムは、ＣＤ−ＲＯＭ９２１に記憶されて、ＣＤ−ＲＯＭドライブ９０５に挿入され、ハードディスク９１４に転送されてもよい。これに代えて、そのプログラムは、図示しないネットワークを介してコンピュータ９０１に送信され、ハードディスク９１４に記憶されてもよい。プログラムは実行の際にＲＡＭ９１３にロードされる。なお、プログラムは、ＣＤ−ＲＯＭ９２１、またはネットワークから直接、ロードされてもよい。 A program that causes the computer system 900 to execute the functions of the authoring apparatus according to the above-described embodiment may be stored in the CD-ROM 921, inserted into the CD-ROM drive 905, and transferred to the hard disk 914. Instead, the program may be transmitted to the computer 901 via a network (not shown) and stored in the hard disk 914. The program is loaded into the RAM 913 when executed. The program may be loaded directly from the CD-ROM 921 or the network.

プログラムは、コンピュータ９０１に、上記実施の形態によるオーサリング装置の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいてもよい。コンピュータシステム９００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 901 to execute the functions of the authoring apparatus according to the above-described embodiment. The program may include only a part of an instruction that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 900 operates is well known and will not be described in detail.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかるオーサリング装置等は、オーサリングを行なう装置等として適しており、特に、オーサリングデータに配置されたオブジェクトについてオーサリング処理を行なう装置等として有用である。 As described above, the authoring device according to the present invention is suitable as a device for performing authoring and the like, and is particularly useful as a device for performing an authoring process on objects arranged in authoring data.

１オーサリング装置
１０１オーサリングデータ格納部
１０２音声受付部
１０２ａマイクロフォン
１０３音声認識部
１０４オーサリング処理部
１０５出力部
１０６表示部
１０６ａモニタ DESCRIPTION OF SYMBOLS 1 Authoring apparatus 101 Authoring data storage part 102 Voice reception part 102a Microphone 103 Speech recognition part 104 Authoring process part 105 Output part 106 Display part 106a Monitor

Claims

An authoring data storage unit for storing authoring data in which one or more objects are arranged;
A voice reception unit for receiving voice;
A voice recognition unit that performs voice recognition processing on the voice received by the voice reception unit;
According to the result of the speech recognition processing of the speech recognition unit, the authoring processing unit that performs the authoring processing on the object arranged in the authoring data stored in the authoring data storage unit and the processing result of the authoring processing unit are output. An output section;
A display unit for displaying authoring data stored in the authoring data storage unit,
The authoring processing unit performs an authoring process on the object when the object to be processed can be identified from at least a part of the speech recognition result, and when the object to be processed cannot be identified, the display unit An authoring device that performs authoring processing for an object displayed by.

The authoring apparatus according to claim 1, wherein the authoring process performed by the authoring processing unit is a process of changing an attribute of an object.

The authoring apparatus according to claim 2, wherein the authoring process performed by the authoring processing unit is a process of changing a position of an object.

The authoring apparatus according to claim 2, wherein the authoring process performed by the authoring processing unit is a process of changing a color of an object.

The authoring apparatus according to any one of claims 1 to 4, wherein the object is an object that integrally includes display data that is display data and audio data that is audio data.

An authoring method that is performed using an authoring data storage unit that stores authoring data in which one or more objects are arranged, a voice reception unit, a voice recognition unit, an authoring processing unit, an output unit, and a display unit. There,
A voice receiving step in which the voice receiving unit receives voice;
A voice recognition step in which the voice recognition unit performs a voice recognition process on the voice received in the voice reception step;
An authoring processing step in which the authoring processing unit performs an authoring process on an object arranged in the authoring data stored in the authoring data storage unit according to a result of the voice recognition processing in the voice recognition step; and the output unit An output step for outputting a processing result by the authoring processing step; and a display step for displaying the authoring data stored in the authoring data storage unit by the display unit,
In the authoring processing step, when an object to be processed can be specified from at least a part of the speech recognition result, an authoring process is performed on the object, and when the object to be processed cannot be specified, the display step An authoring method that performs authoring for the object displayed in.

A computer capable of accessing an authoring data storage unit in which authoring data in which one or more objects are arranged is stored;
A voice reception unit for receiving voice;
A voice recognition unit that performs voice recognition processing on the voice received by the voice reception unit;
According to the result of the speech recognition processing of the speech recognition unit, the authoring processing unit that performs the authoring processing on the object arranged in the authoring data stored in the authoring data storage unit and the processing result of the authoring processing unit are output. Function as an output unit and a display unit for displaying the authoring data stored in the authoring data storage unit;
The authoring processing unit performs an authoring process on the object when the object to be processed can be identified from at least a part of the speech recognition result, and when the object to be processed cannot be identified, the display unit A program that performs authoring for the object displayed by.