JP2002269530A

JP2002269530A - Robot, behavior control method of the robot, program and storage medium

Info

Publication number: JP2002269530A
Application number: JP2001071053A
Authority: JP
Inventors: Hiroaki Ogawa; 浩明小川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-03-13
Filing date: 2001-03-13
Publication date: 2002-09-20

Abstract

PROBLEM TO BE SOLVED: To control the appearance probability of the learnt behavior of a robot. SOLUTION: This robot comprises a learning part 1 for learning a new behavior, a detection part 2 for detecting an external sensor signal, an evaluation part 3 for evaluating the sensor signal detected by the detection part 2, and a coordinate part 4 for performing a weighting to the behavior learnt by the learning part 1 as the evaluation by the evaluation part 3.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、自律的に行動する
ロボット装置及びそのようなロボット装置の行動制御方
法、そのようなロボット装置の行動を制御するためのプ
ログラム、及びそのようなプログラムが記録された記録
媒体に関する。The present invention relates to a robot apparatus that behaves autonomously, a method for controlling the action of such a robot apparatus, a program for controlling the action of such a robot apparatus, and a program for recording such a program. The recorded recording medium.

【０００２】[0002]

【従来の技術】自律型のエンターテイメントロボット装
置は、予め持っているデータ（具体的には、行動パター
ンデータ）を、感情や本能の内的状態に応じて自動的に
再生することにより、多様な行動を出現させている。2. Description of the Related Art An autonomous entertainment robot apparatus automatically reproduces data (specifically, action pattern data) stored in advance in accordance with emotions and internal states of instinct, thereby realizing various entertainment robot apparatuses. The action is appearing.

【０００３】一方で、予め決定されている行動パターン
データを再生するのではなく、周囲の環境や状況に応じ
た行動をするロボット装置も提案されている。すなわ
ち、行動パターンデータを保持し、その行動パターンデ
ータを利用して行動をするのではなく、外部環境等に応
じてその場限りの行動を出現させるロボット装置といっ
たものがある。On the other hand, there has been proposed a robot apparatus which does not reproduce predetermined action pattern data but performs an action in accordance with the surrounding environment or situation. That is, there is a robot device that holds action pattern data and does not act using the action pattern data, but causes an ad-hoc action to appear according to an external environment or the like.

【０００４】具体的には、特開2000-122992号公報に
は、報酬（reward、リワード）を行動意欲の基準とする
ことで、外部環境等に応じて行動範囲を選ぶように自律
的に行動するロボット装置の技術が提案されている。[0004] Specifically, Japanese Patent Application Laid-Open No. 2000-122992 discloses that by using a reward as a criterion for action motivation, an autonomous action such as selecting an action range according to an external environment or the like is disclosed. There has been proposed a technology of a robot device that performs the following.

【０００５】また、特開平11-126198号公報には、リカ
レント型ニューラルネットワーク（以下、ＲＮＮとい
う。）を用いて行動の学習を行う技術が提案されてい
る。この技術では、ＲＮＮを利用した行動の学習によ
り、一連の行動を分節化して獲得することが可能とされ
ており、更に一連の分節化された行動のシーケンスを分
節化したような上位の構造、さらにそのまた上位の構造
を階層的に獲得することが可能とされている。この技術
によれば、ロボット装置は、個々の学習状況に応じて、
例えば、「出口に向かい直進する」、「部屋から出
る」、「廊下を右に曲がる」或いは「廊下を直進する」
等の種々の動作を分節化し、それらの動作を組み合わせ
て行動するようになされている。Japanese Patent Application Laid-Open No. H11-126198 proposes a technique for learning behavior using a recurrent neural network (hereinafter, referred to as RNN). In this technology, a series of actions can be segmented and acquired by learning actions using an RNN, and a higher-level structure such as a segmented sequence of a series of segmented actions, In addition, it is possible to hierarchically acquire a higher-order structure. According to this technology, the robot device is adapted to each learning situation,
For example, "go straight to the exit", "get out of the room", "turn right in the corridor" or "go straight in the corridor"
And the like are segmented, and these actions are combined to act.

【０００６】このように分節化して行動を学習すること
が可能とされたロボット装置は、使用者に応じて様々な
行動を学習により獲得することができるようになる。す
なわち、ロボット装置が学習する動作は、学習環境が異
なるので、動作環境に応じて様々な動作を獲得すること
ができるのである。つまり、使用者（例えば、飼い主）
によりロボット装置に教示する環境が異なるので、その
ような環境に応じて、ロボット装置は、様々な動作を獲
得することができる。[0006] The robot apparatus capable of learning the behavior by segmentation as described above can acquire various behaviors by learning according to the user. That is, since the learning operation of the robot apparatus has a different learning environment, various operations can be obtained according to the operating environment. That is, the user (eg, owner)
Therefore, the environment in which the robot apparatus is taught is different, and the robot apparatus can acquire various operations according to such an environment.

【０００７】このようなロボット装置は、上述のように
行動パターンデータを再生することでしか行動できない
ロボット装置と比較して、自己の環境に則した行動を行
うようになるので、使用者から見て、さらに自然に自律
的な行動をするものとして鑑賞することができる。[0007] Such a robot device behaves in accordance with its own environment as compared with a robot device that can only act by reproducing the behavior pattern data as described above. Therefore, it can be appreciated as a more autonomous behavior.

【０００８】[0008]

【発明が解決しようとする課題】ところで、ロボット装
置が、好ましくない行動を学習してしまう場合がある。
例えば、「花瓶にぶつかる」や「窓から出る」等の行動
は好ましくない行動である。このような好ましくない行
動については、抑制する必要がある。しかしその一方
で、ロボット装置が好ましい行動を学習した場合には、
その行動を再び行うようにしたいものでもある。However, there is a case where the robot device learns an undesired action.
For example, actions such as "colliding with a vase" and "going out of a window" are undesirable actions. It is necessary to suppress such undesirable behavior. However, on the other hand, if the robot device learns a favorable behavior,
We also want to do that again.

【０００９】そこで、本発明は、上述の実情に鑑みてな
されたものであり、学習した行動の出現確率を制御する
ことを可能とするロボット装置、ロボット装置の行動制
御方法、プログラム及び記録媒体の提供を目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-described circumstances, and has been made in consideration of the above-mentioned circumstances, and is intended to provide a robot apparatus capable of controlling the appearance probability of a learned action, a robot apparatus action control method, a program, and a recording medium. For the purpose of providing.

【００１０】[0010]

【課題を解決するための手段】本発明に係るロボット装
置は、上述の課題を解決するために、外部からの外部入
力信号を検出する入力信号検出手段と、入力信号検出手
段により検出された外部入力信号を評価する評価手段
と、評価手段による評価結果を行動内容情報に対応付け
する対応付け手段と、対応付け手段により対応付けされ
た評価に基づいて、行動内容情報に基づいて行動の制御
を行う行動制御手段とを備える。In order to solve the above-mentioned problems, a robot apparatus according to the present invention has an input signal detecting means for detecting an external input signal, and an external signal detected by the input signal detecting means. Evaluating means for evaluating the input signal; associating means for associating the evaluation result by the evaluating means with the action content information; and controlling the action based on the action content information based on the evaluation associated by the associating means. Action control means for performing the action.

【００１１】このような構成を備えるロボット装置は、
入力信号検出手段により検出された外部入力信号を評価
手段により評価し、評価手段による評価結果を行動内容
情報に対応付け手段による対応付けをし、対応付け手段
により対応付けされた評価に基づいて、行動内容情報に
基づいて行動制御手段により行動の制御をする。これに
より、ロボット装置は、学習した行動を評価して、その
評価に基づいて行動を出現させるようになる。[0011] The robot device having such a configuration is as follows.
The external input signal detected by the input signal detection unit is evaluated by the evaluation unit, the evaluation result by the evaluation unit is associated with the action content information by the association unit, and based on the evaluation associated by the association unit, The action is controlled by the action control means based on the action content information. Thereby, the robot device evaluates the learned behavior and causes the behavior to appear based on the evaluation.

【００１２】また、本発明に係るロボット装置の行動制
御方法は、上述の課題を解決するために、外部からの外
部入力信号をロボット装置が検出する入力信号検出工程
と、入力信号検出工程にて検出された外部入力信号をロ
ボット装置が評価する評価工程と、ロボット装置にて評
価工程にて得た評価結果を行動内容情報に対応付けする
対応付け工程と、対応付け工程にて対応付けされた評価
に基づいて、行動内容情報に基づいてロボット装置が行
動の制御を行う行動制御工程とを有する。このようなロ
ボット装置の行動制御方法により、ロボット装置は、学
習した行動を評価して、その評価に基づいて行動を出現
させるようになる。Further, in order to solve the above-mentioned problems, a behavior control method for a robot apparatus according to the present invention includes an input signal detection step in which the robot apparatus detects an external input signal from outside, and an input signal detection step. An evaluation step in which the robot apparatus evaluates the detected external input signal, an associating step in which the evaluation result obtained in the evaluation step in the robot apparatus is associated with the action content information, and an associating step. A behavior control step in which the robot device controls the behavior based on the behavior content information based on the evaluation. According to such a behavior control method for a robot device, the robot device evaluates the learned behavior and causes the behavior to appear based on the evaluation.

【００１３】また、本発明に係るプログラムは、上述の
課題を解決するために、外部からの外部入力信号を検出
する入力信号検出工程と、入力信号検出工程にて検出さ
れた外部入力信号を評価する評価工程と、評価工程にて
得た評価結果を行動内容情報に対応付けする対応付け工
程と、対応付け工程にて対応付けされた評価に基づい
て、行動内容情報に基づいて行動の制御を行う行動制御
工程とをロボット装置に実行させるものである。このよ
うなプログラムにより行動の制御が実行されるロボット
装置は、学習した行動を評価して、その評価に基づいて
行動を出現させるようになる。In order to solve the above-mentioned problems, a program according to the present invention includes an input signal detecting step of detecting an external input signal, and an external input signal detected in the input signal detecting step. An evaluation step to perform, an associating step of associating the evaluation result obtained in the evaluating step with the action content information, and controlling the action based on the action content information based on the evaluation associated in the associating step. And a behavior control step to be performed by the robot apparatus. The robot device in which the control of the action is executed by such a program evaluates the learned action and causes the action to appear based on the evaluation.

【００１４】また、本発明に係る記録媒体は、上述の課
題を解決するために、外部からの外部入力信号を検出す
る入力信号検出工程と、入力信号検出工程にて検出され
た外部入力信号を評価する評価工程と、評価工程にて得
た評価結果を行動内容情報に対応付けする対応付け工程
と、対応付け工程にて対応付けされた評価に基づいて、
行動内容情報に基づいて行動の制御を行う行動制御工程
とをロボット装置に実行させるプログラムが記録されて
いる。このような記録媒体に記録されているプログラム
により行動の制御が実行されるロボット装置は、学習し
た行動を評価して、その評価に基づいて行動を出現させ
るようになる。According to another aspect of the present invention, there is provided a recording medium comprising: an input signal detecting step for detecting an external input signal from the outside; and an external input signal detected in the input signal detecting step. Based on the evaluation step to evaluate, the associating step of associating the evaluation result obtained in the evaluating step with the action content information,
A program for causing the robot apparatus to execute a behavior control step of controlling behavior based on behavior content information is recorded. The robot device in which the control of the behavior is executed by the program recorded in such a recording medium evaluates the learned behavior, and causes the behavior to appear based on the evaluation.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を用いて説明する。この実施の形態は、本発明
を、自律的に行動するロボット装置に適用したものであ
る。Embodiments of the present invention will be described below with reference to the drawings. In this embodiment, the present invention is applied to a robot device that acts autonomously.

【００１６】本発明が適用されるロボット装置は、周囲
の環境や内部の状態に応じて自律的に行動をする自律型
のロボット装置である。そして、ロボット装置は、本発
明が適用されることにより、新たな行動を学習するとと
もに、学習した行動にその評価とされる例えば重み付け
をし、さらにその重みに基づいて行動するようになされ
ている。The robot device to which the present invention is applied is an autonomous robot device that behaves autonomously according to the surrounding environment and internal state. By applying the present invention, the robot device learns a new behavior, weights the learned behavior, for example, as an evaluation thereof, and acts based on the weight. .

【００１７】実施の形態の説明では、本発明の適用され
て実現されるロボット装置による行動の学習について先
ず説明して、その後、ロボット装置の具体的な構成につ
いて説明する。In the description of the embodiment, learning of behavior by a robot device realized by applying the present invention will be described first, and then a specific configuration of the robot device will be described.

【００１８】（１）行動の学習及び学習した行動の重み
付けロボット装置は、図１に示すように、学習部１、検出部
２、評価部３及び対応付け部４を備えることにより本発
明を実現している。ここで、学習部１は、新たな行動の
学習をする学習手段として機能し、検出部２は、外部か
らの外部入力信号を検出する入力信号検出手段として機
能し、評価部３は、検出部２により検出された外部入力
信号を評価する評価手段として機能し、そして、対応付
け部４は、学習部１により学習した行動に対して、評価
部３による評価とされる重み付けをする対応付け手段と
して機能する。例えば、学習部１、検出部２、評価部３
及び対応付け部４は、ロボット装置において、ソフトウ
ェアプログラムによって構成されるオブジェクトやモジ
ュールとして構成されている。(1) Learning of Behavior and Weighting of Learned Behavior As shown in FIG. 1, the robot apparatus implements the present invention by including a learning unit 1, a detection unit 2, an evaluation unit 3, and a correspondence unit 4. are doing. Here, the learning unit 1 functions as a learning unit that learns a new action, the detection unit 2 functions as an input signal detection unit that detects an external input signal from outside, and the evaluation unit 3 includes a detection unit. 2. The associating unit 4 functions as an evaluation unit that evaluates the external input signal detected by the learning unit 2. The associating unit 4 weights the behavior learned by the learning unit 1 to be evaluated by the evaluating unit 3. Function as For example, learning unit 1, detection unit 2, evaluation unit 3
The associating unit 4 is configured as an object or module configured by a software program in the robot device.

【００１９】学習部１は、例えば、リカレント型ニュー
ラルネットワーク（以下、ＲＮＮという。）といった学
習モデルによって構成されている。ここで、ＲＮＮは、
学習対象とされる行動の情報が、入力層、中間層及び出
力層に向かって入力されるニューラルネットワークとさ
れている。このＲＮＮにおける行動学習の際の処理につ
いては後で詳しく説明する。The learning unit 1 is configured by a learning model such as a recurrent neural network (hereinafter, RNN). Here, RNN is
The information of the action to be learned is a neural network that is input toward the input layer, the intermediate layer, and the output layer. The processing at the time of action learning in the RNN will be described later in detail.

【００２０】学習部１は、入力がなされることにより、
対応する出力をし、入力を学習対象として学習する。そ
して、入力は、時系列データとしてなされるものであ
る。例えば、行動を学習する際の入力としては、ロボッ
ト装置が行動することによって得られるセンサ入力やモ
ータ出力等が挙げられ、具体的には、センサ入力として
は撮像信号が挙げられる。また、学習部１に学習対象と
して入力される情報についてはこれに限定されるものは
なく、例えば行動に対応して内部的に生成される行動情
報であってもよい。The learning unit 1 receives the input,
Output the corresponding output and learn with the input as the learning target. The input is made as time-series data. For example, examples of the input when learning an action include a sensor input and a motor output obtained by the action of the robot apparatus. Specifically, the sensor input includes an image signal. The information input as a learning target to the learning unit 1 is not limited to this, and may be, for example, behavior information generated internally corresponding to a behavior.

【００２１】このような学習部１は、学習対象とされる
入力に対して、出力をするようになされている。ここ
で、学習部１の出力は、行動の内容を示す行動内容情報
であり、いわゆる教示信号として把握されるものであ
る。この出力は、学習後では、同一の行動を出現してい
る限りにおいて、同様な値を示すようになる。The learning section 1 outputs an input to be learned. Here, the output of the learning unit 1 is action content information indicating the content of the action, and is grasped as a so-called teaching signal. This output shows the same value after learning as long as the same action appears.

【００２２】一方、検出部２は、外部入力信号を検出す
る。例えば、検出部２は、センサである。具体的には、
後述するように、ロボット装置の頭頂部に配置されるタ
ッチセンサである。ロボット装置は、タッチセンサによ
り、使用者との間のインターフェースとして、使用者に
より「撫でられたこと」や「叩かれたこと」を検出して
いる。On the other hand, the detector 2 detects an external input signal. For example, the detection unit 2 is a sensor. In particular,
As will be described later, it is a touch sensor arranged at the top of the robot device. The robot device detects “stroke” or “hit” by the user as an interface with the user by using the touch sensor.

【００２３】評価部３は、検出部２により検出された信
号に基づいて評価を行う。すなわち例えば、「撫でられ
たこと」や「褒められたこと」等の使用者によりなされ
た行為を検出して、評価値としての重みを発生させる。
また、評価部３による評価については、例えば、評価対
象とされる信号パターンを予め用意しておき、検出部２
によって検出された信号パターンと予め用意している信
号パターンとを比較することにより、使用者によりなさ
れた行為の評価を行う。The evaluation unit 3 performs an evaluation based on the signal detected by the detection unit 2. That is, for example, an action performed by the user such as “stroke” or “praise” is detected, and a weight as an evaluation value is generated.
For the evaluation by the evaluation unit 3, for example, a signal pattern to be evaluated is prepared in advance and the detection unit 2
By comparing the detected signal pattern with a previously prepared signal pattern, the action performed by the user is evaluated.

【００２４】また、上述の検出部２については、タッチ
センサに限定されるものではなく、使用者によりなされ
た行為を評価できるものであれば良い。例えば、検出部
２はマイクであっても良い。この場合、評価部３は、マ
イクにより入力された使用者の声から、例えば、声のト
ーンから、使用者によりなされた評価を判別することも
できる。すなわち例えば、評価部３は、「ダメ」や「よ
し」等といって、使用者が発した音声を識別して、評価
する。The detection section 2 is not limited to a touch sensor, but may be any as long as it can evaluate an action performed by a user. For example, the detection unit 2 may be a microphone. In this case, the evaluation unit 3 can also determine the evaluation made by the user from the voice of the user input by the microphone, for example, from the tone of the voice. That is, for example, the evaluation unit 3 identifies and evaluates a voice uttered by the user, such as “no good” or “good”.

【００２５】このような評価部３により得た重みを、対
応付け部４は、学習部１からの出力に対応付けする。対
応付け部４は、例えば、記憶手段であって、学習部１か
らの出力と重みとを対して記憶することにより、対応付
けを実現している。The association unit 4 associates the weight obtained by the evaluation unit 3 with the output from the learning unit 1. The association unit 4 is, for example, a storage unit, and implements association by storing the output from the learning unit 1 and the weight in association with each other.

【００２６】また、対応付けについては、検出部２によ
り検出された外部入力信号と同時に或いは前後して学習
分１により学習された行動（出力）に対応付けするよう
にする。これは、ペット等の場合には通常、使用者は、
教示する行動と同時に、或いは前後して、評価とされる
「撫でる」、「叩く」等をするからであり、このような
行為に対応するものである。しかし、このような対応付
けのタイミングに限定されないことはいうまでもない。The association is made with the action (output) learned by the learning unit 1 simultaneously with or before or after the external input signal detected by the detection unit 2. This is usually the case for pets, etc.
This is because "stroke", "hit", or the like, which is evaluated simultaneously with or before or after the action to be taught, corresponds to such an action. However, it is needless to say that the timing of such association is not limited.

【００２７】ロボット装置は、以上のような図１に示し
た構成を備えることにより、新たな行動を学習すること
ができるようになり、さらに、学習した行動に重み付け
をすることができるようになる。これにより、ロボット
装置は、重み付けに応じて、学習した行動の制御をする
ことができるようになる。具体的には、ある行動を学習
した際に頭を撫でることにより、その行動を好ましい行
動として教示でき、その結果、ロボット装置は、頻繁に
その行動を出現させるようになり、一方、ある行動を学
習した際に頭を叩くことにより、その行動を好ましくな
い行動として教示でき、その結果、ロボット装置がその
行動をほとんど出現しなくなるようにすることもでき
る。すなわち、ロボット装置の行動に対して使用者（飼
い主）による「しつけ」をすることができるようにな
る。The robot apparatus having the configuration shown in FIG. 1 as described above can learn a new action, and can weight the learned action. . Thereby, the robot device can control the learned behavior according to the weighting. Specifically, by stroking the head when learning a certain action, the action can be taught as a preferable action, and as a result, the robot apparatus comes to appear the action frequently, while By hitting the head when learning, the behavior can be taught as an unfavorable behavior, and as a result, the robot device can hardly appear the behavior. That is, the user (owner) can “train” the behavior of the robot device.

【００２８】このような行動の制御については、具体的
には、ロボット装置の行動を制御する図示しない行動制
御部が、重み付けを参照して、行動の出現確率を決定す
ることにより実現することができる。例えば、重みが大
きい場合には、行動の出現確率を高くして、重みが小さ
い場合には、行動の出現確率を低くするようにする。More specifically, such behavior control can be realized by a behavior control unit (not shown) for controlling the behavior of the robot apparatus, which determines the appearance probability of the behavior by referring to the weights. it can. For example, when the weight is large, the probability of appearance of the action is increased, and when the weight is small, the probability of appearance of the action is reduced.

【００２９】なお、このような行動に対する重み付けに
ついては、完結される行動全体に対して行うこともで
き、また、一連の動作として完結される行動の当該各動
作それぞれについてすることもできる。後者の例は、後
で詳述する具体例となる。The weighting of such an action can be performed for the entire action to be completed, or for each of the actions to be completed as a series of actions. The latter example is a specific example described in detail later.

【００３０】以上のように、ロボット装置は、行動を学
習し、その行動の出現確率を決定することにより、ペッ
トに対してなされるような「しつけ」がなされ、より生
物的な表現が実現されたものとなる。As described above, the robot device learns the behavior and determines the appearance probability of the behavior, so that "discipline" is performed on the pet, and a more biological expression is realized. It will be.

【００３１】（２）行動を学習するための具体的な構成以上のように、重み付けをして行動を学習することがで
きる。行動を学習する技術としては、例えば、特開平11
-126198号公報に開示されている技術が挙げられる。本
発明が適用されたロボット装置は、例えば、この技術を
採用して行動を学習している。ここでは、その行動を学
習する技術の概略について説明する。(2) Specific Configuration for Learning Behavior As described above, behavior can be learned with weighting. Techniques for learning behavior include, for example,
-126198. The robot device to which the present invention is applied learns an action, for example, by adopting this technology. Here, an outline of a technique for learning the behavior will be described.

【００３２】図２は、データ処理部の構成例を示してい
る。この図２に示す構成は、図１に示す学習部１の具体
的な構成になる。ロボット装置は、後で詳述するよう
に、障害物を検出するセンサと、ロボットを移動させる
ために駆動されるモータが備えており、それらの情報
が、このデータ処理部に学習対象として入力される。FIG. 2 shows a configuration example of the data processing unit. The configuration shown in FIG. 2 is a specific configuration of the learning unit 1 shown in FIG. As will be described in detail later, the robot device includes a sensor that detects an obstacle and a motor that is driven to move the robot.The information is input to the data processing unit as a learning target. You.

【００３３】ｎ個のＲＮＮ１−１〜１−ｎには、センサ
とモータの状態に対応する入力ｘ_ｔが入力されている。
ＲＮＮ１−１は、図３に示すように構成されている。な
お、図示は省略するが、他のＲＮＮ１−２〜１−ｎも、
この図３に示すＲＮＮ１−１と同様に構成されている。[0033] n-number of RNN1-1～1-n are input _{x t} corresponding to the state of the sensor and the motor is input.
The RNN 1-1 is configured as shown in FIG. Although not shown, the other RNNs 1-2 to 1-n also
The configuration is the same as that of the RNN 1-1 shown in FIG.

【００３４】この図３に示すように、ＲＮＮ１−１は、
所定の数の入力層のニューロン３１を有し、このニュー
ロン３１に、センサの状態に対応する入力ｓ_ｔと、モー
タの状態に対応する入力ｍ_ｔが入力されている。ニュー
ロン３１の出力は、中間層のニューロン３２を介して、
出力層のニューロン３３に供給されるようになされてい
る。そして、出力層のニューロン３３からは、ＲＮＮ１
−１のセンサの状態に対応する出力ｓ_ｔ＋１と、モータ
の状態に対応する出力ｍ_ｔ＋１が出力されるようになさ
れている。また、出力の一部は、コンテキスト（contex
t）Ｃ_ｔとして、入力層のニューロン３１にフィードバ
ックされるようになされている。As shown in FIG. 3, RNN 1-1 is
Has a neuronal 31 of a predetermined number of the input layer, the neuron 31, an input s _t corresponding to the state of the sensor, the input m _t corresponding to the state of the motor is input. The output of the neuron 31 is output through the neuron 32 in the hidden layer.
The signal is supplied to the neuron 33 in the output layer. From the neuron 33 in the output layer, RNN1
An output _{st + 1} corresponding to the state of the sensor of −1 and an output mt _{+ 1} corresponding to the state of the motor are output. Also, part of the output is context (contex
As t) C _t, it is adapted to be fed back to the neuron 31 in the input layer.

【００３５】ＲＮＮ１−１〜１−ｎの出力は、対応する
ゲート２−１〜２−ｎを介して合成回路３に入力され、
ここで合成され、予測出力ｙ_ｔ＋１が出力されるように
なされている。Outputs of the RNNs 1-1 to 1-n are input to the synthesizing circuit 3 via the corresponding gates 2-1 to 2-n.
Here, they are synthesized and the predicted output yt _{+ 1} is output.

【００３６】学習時においては、教師信号としての目標
値ｙ^＊ _ｔ＋１と、各ＲＮＮ１−１〜１−ｎの出力の誤
差が、対応するゲート２−１〜２−ｎの状態を制御する
ようになされている。At the time of learning, the error between the target value y ^* _{t + 1} as the teacher signal and the output of each of the RNNs 1-1 to ₁ -n controls the state of the corresponding gate 2-1 to 2-n. It has been done.

【００３７】以上の下位のＲＮＮ１−１〜１−ｎ、ゲー
ト２−１〜２−ｎ、及び合成回路３と同様の構成が、よ
り上位の階層にも形成されている。すなわち、上位の階
層には、ＲＮＮ１１−１〜１１−ｎ、ゲート１２−１〜
１２−ｎ、及び合成回路１３が設けられている。そし
て、ＲＮＮ１１−１〜１１−ｎには、下位の階層のゲー
ト２−１〜２−ｎの導通状態（開閉度）に対応するシー
ケンス（ゲートシーケンス）Ｇ_ｔが入力されるようにな
されている。そして、各ＲＮＮ１１−１〜１１−ｎから
は、出力Ｇ^１ _Ｔ＋１乃至Ｇ^ｎ _Ｔ＋１が出力され、合成
回路１３からは、予測出力Ｇ_Ｔ＋１が出力されるように
なされている。また、学習時においては、教師信号とし
て、目標値Ｇ^＊ _Ｔ＋１が入力されている。なお、図２
には、２つの階層だけが示されているが、必要に応じ
て、さらに、より上位の階層を設けることも可能であ
る。The same configuration as the above-described lower RNNs 1-1 to 1-n, gates 2-1 to 2-n, and synthesizing circuit 3 is also formed in a higher hierarchy. That is, RNNs 11-1 to 11-n and gates 12-1 to 12-1 are located at higher levels.
12-n and a combining circuit 13 are provided. Then, the RNN11-1~11-n, are adapted sequence corresponding to the conduction state of the gate 2-1 to 2-n of the lower layer (closed degree) (gating sequence) _{G t} is input . Then, outputs G ¹ _{T + 1 to} G ⁿ _{T + 1} are output from the RNNs 11-1 to 11 -n, and a prediction output G _{T + 1} is output from the combining circuit 13. At the time of learning, a target value G ^* _{T + 1} is input as a teacher signal. Note that FIG.
Shows only two hierarchies, but higher hierarchies can be provided if necessary.

【００３８】図４は、上位の階層を構成する第１のＲＮ
Ｎ１１−１の構成を示している。なお、他のＲＮＮ１１
−２〜１１−ｎも、この図４に示すＲＮＮ１１−１と同
様の構成とされている。FIG. 4 is a diagram showing a first RN constituting a higher hierarchy.
The configuration of N11-1 is shown. Note that other RNNs 11
-2 to 11-n have the same configuration as the RNN 11-1 shown in FIG.

【００３９】図４に示すように、上位の階層のＲＮＮ１
１−１は、基本的に、図３に示した下位の階層のＲＮＮ
１−１と同様に構成されており、入力層には複数のニュ
ーロン４１が、中間層には複数のニューロン４２が、そ
して出力層には複数のニューロン４３が配置されてい
る。入力層には、ゲート２−１〜２−ｎの導通状態に対
応する信号ｇ^１ _Ｔ乃至ｇ^ｎ _Ｔが入力されるとともに、
ゲートの導通（開放）している周期（時間）Ｉ_Ｔが入力
される。出力層からは、これらの入力に対応して、出力
ｇ^１ _Ｔ＋１乃至ｇ^ｎ _Ｔ＋１と、Ｉ_Ｔ＋１が出力され
る。また、出力層の出力の一部は、コンテキストＣ_Ｔと
して入力層にフィードバックされている。As shown in FIG. 4, the upper layer RNN1
1-1 is basically the RNN of the lower hierarchy shown in FIG.
1-1, a plurality of neurons 41 are arranged in an input layer, a plurality of neurons 42 are arranged in an intermediate layer, and a plurality of neurons 43 are arranged in an output layer. Signals g ¹ _{T to} g ⁿ _T corresponding to the conduction states of the gates 2-1 to 2-n are input to the input layer,
Period that the gate conduction (opening) (Time) I _T is input. From the output layer, outputs g ¹ _{T + 1 to} g ⁿ _{T + 1} and _{IT + 1} are output corresponding to these inputs. Also, part of the output of the output layer is fed back to the input layer as a context C _T.

【００４０】ここで、ＲＮＮ１−１〜１−ｎのアルゴリ
ズムについて説明する。ゲートの導通状態は、ソフトマ
ックス（soft-max）のアクティベーションファンクショ
ンを用いて、（１）式で示すように表される。Here, the algorithm of the RNNs 1-1 to 1-n will be described. The conduction state of the gate is expressed as shown in equation (1) using a soft-max activation function.

【００４１】[0041]

【数１】 (Equation 1)

【００４２】ここで、ｇ^ｉは、ｉ番目のゲートの導通状
態に対応するゲート係数を表し、ｓ ^ｉは、ｉ番目のゲー
トの導通状態の内部状態に対応する値を表している。従
って、合成回路３の出力ｙ_ｔ＋１は、（２）式で表され
る。Where gⁱIs the conduction state of the i-th gate
Represents the gate coefficient corresponding to the state, s ⁱIs the i-th game
The value corresponding to the internal state of the conduction state of the switch. Obedience
Thus, the output y of the synthesis circuit 3_{t + 1}Is given by equation (2)
You.

【００４３】[0043]

【数２】 (Equation 2)

【００４４】ここで、予測学習時に最大の値となる
（３）式で示す尤度関数を定義する。Here, a likelihood function represented by the equation (3), which becomes the maximum value during prediction learning, is defined.

【００４５】[0045]

【数３】 (Equation 3)

【００４６】なお、ここで、σは、スケーリングパラメ
ータを表している。Here, σ represents a scaling parameter.

【００４７】学習時、ＲＮＮ１−１乃至１−ｎの重み係
数とゲート係数ｇは、尤度関数が最大となるように同時
に更新される。認識時においては、ゲート係数だけが更
新される。At the time of learning, the weight coefficients and gate coefficients g of RNNs 1-1 to 1-n are simultaneously updated so that the likelihood function is maximized. At the time of recognition, only the gate coefficient is updated.

【００４８】これらの重み係数とゲート係数を更新する
ルールを確立するために、尤度関数の指数関数の内部変
数Ｓ^ｉに関する傾きと、ｉ番目のＲＮＮの出力ｙ^ｉに関
する傾きを（４）式及び（５）のように求める。[0048] In order to establish a rule to update these weighting coefficients and gate coefficient, and the tilt about the internal variable S ⁱ of the exponential function of the likelihood function, the tilt related to the output y ⁱ of the i-th RNN (4) equation And (5).

【００４９】[0049]

【数４】 (Equation 4)

【００５０】[0050]

【数５】 (Equation 5)

【００５１】ここで、ｇ（ｉ｜ｘ_ｔ，ｙ^＊ _ｔ＋１）
は、ｉ番目のＲＮＮが入力ｘ_ｔのとき、目標出力ｙ^＊
_ｔ＋１を発生する事象後確率を意味し、（６）式で表さ
れる。Here, g (i | x _t , y ^* _{t + 1} )
When the i-th RNN is an input _{x t,} the target output ^{y *}
_It means the post-event probability of generating _{t + 1} and is expressed by equation (6).

【００５２】[0052]

【数６】 (Equation 6)

【００５３】ここで、||ｙ^＊ _ｔ＋１−ｙ^ｊ _ｔ＋１||^２
は、現在の予測の自乗誤差を表している。[0053] In this ^{_{^{case, || y * t + 1 -y}}} j t + 1 || 2
Represents the square error of the current prediction.

【００５４】上記（４）式は、ｓ^ｉを更新する方向を表
している。また、（５）式に示されるように、尤度関数
の指数関数のｙ^ｉ _ｔ＋１に関する傾きは、誤差条件ｙ
^＊ _ｔ _＋１−ｙ^ｉ _ｔ＋１の誤差項を含んでいる。この誤
差項は、ｉ番目のＲＮＮの事象後確率により重み付けさ
れている。[0054] Equation (4) represents a direction to update the ^{s i.} Also, (5) as shown in the formula, the tilt related to y ^{i t} _{+ 1} of the exponential function of the likelihood function, the error condition y
^* _T ₊₁ contains the error term of -y ⁱ _{t + 1.} This error term is weighted by the i-th RNN post-event probability.

【００５５】このように、ＲＮＮ１−１〜１−ｎの重み
係数は、事象後確率にのみ比例して、ｉ番目のＲＮＮの
出力と目標値の誤差を補正するように調整される。これ
によりｎ個のＲＮＮのうち、１つのエキスパートＲＮＮ
だけが、与えられたトレーニングパターン（学習パター
ン）を排他的に学習するようになされる。各ＲＮＮの誤
差は、（７）式で表される。As described above, the weight coefficients of the RNNs 1-1 to 1-n are adjusted so as to correct the error between the output of the i-th RNN and the target value in proportion to only the post-event probability. Thereby, one expert RNN out of n RNNs
Only the given training pattern (learning pattern) is exclusively learned. The error of each RNN is expressed by equation (7).

【００５６】[0056]

【数７】 (Equation 7)

【００５７】ＲＮＮ１−１〜１−ｎの実際の学習は、上
記式で得られた誤差に基づいてバックプロパゲーション
法により実行される。The actual learning of the RNNs 1-1 to 1-n is executed by the back propagation method based on the error obtained by the above equation.

【００５８】これにより、ＲＮＮ１−１乃至１−ｎは、
入力ｘ_ｔのうち、それぞれ他と異なる所定の時系列パタ
ーンを識別することができるエキスパートとなるよう
に、学習が行われる。Thus, RNNs 1-1 to 1-n are:
Of the input x _t, so that each the experts can identify predetermined time series pattern which is different from the others, learning is performed.

【００５９】以上のことは、上位の階層におけるＲＮＮ
１１−１〜１１−ｎにおいても同様である。ただし、こ
の場合における入力は、ゲートシーケンスＧ_Ｔであり、
その出力は、Ｇ^ｉ _Ｔ＋１となる。The above is based on the fact that the RNN in the higher hierarchy
The same applies to 11-1 to 11-n. However, the input in this case is a gate sequence G _T,
The output is G ⁱ _{T + 1} .

【００６０】このような構成により、個別の動作をＲＮ
Ｎ１−１〜１−ｎが個別に学習することができる。そし
て、ＲＮＮ１−１〜１−ｎが学習し各動作の発現は、ゲ
ート２−１〜２−ｎで管理されており、このゲートの様
々な動作シーケンス（つまり様々な動作の順序の組み合
わせ）をＲＮＮ１１−１〜１１−ｎが学習している。す
なわち、このような情報の学習手法により、行動を文節
化して学習することができるようになる。With such a configuration, individual operations can be performed by RN
N1-1 to 1-n can learn individually. The RNNs 1-1 to 1-n learn and the manifestation of each operation is managed by the gates 2-1 to 2-n, and various operation sequences of the gates (that is, combinations of various operation orders) are determined. RNNs 11-1 to 11-n are learning. That is, by using such an information learning method, the behavior can be segmented and learned.

【００６１】このような複数のＲＮＮによって構成され
た学習部１を有することで、ロボット装置は、図５に示
すような通路を構成する部屋を移動し、その移動の際に
行動を学習することができる。例えば、距離センサに基
づいた行動を学習をする。具体的には、ロボット装置
は、部屋を移動し、その間に学習部を構成する層を自己
組織化することにより、行動の学習をするのである。By having such a learning unit 1 composed of a plurality of RNNs, the robot device can move in a room forming a passage as shown in FIG. 5 and learn an action during the movement. Can be. For example, the action based on the distance sensor is learned. Specifically, the robot device learns the behavior by moving in the room and self-organizing the layers constituting the learning unit during the movement.

【００６２】なお、この特開平11-126198号公報には、
上述したＲＮＮを利用することにより、実際に行動をし
なくても、行動を連想することができることが開示され
ており、例えば、図６に示すような構成を、データ処理
部が有することにより、それは可能とされている。Incidentally, Japanese Patent Application Laid-Open No. 11-126198 discloses that
It is disclosed that by using the above-described RNN, it is possible to associate an action without actually performing an action. For example, when the data processing unit has a configuration as illustrated in FIG. It is possible.

【００６３】以上のように概略を説明した学習手法の技
術が特開平11-126198号公報に開示されており、本発明
に係る実施の形態のロボット装置の学習部１は、このよ
うな学習手法を取り入れて構築することができる。The technique of the learning method outlined above is disclosed in Japanese Patent Application Laid-Open No. H11-126198, and the learning unit 1 of the robot apparatus according to the embodiment of the present invention employs such a learning technique. Can be built.

【００６４】このような構成として学習部１が構成した
場合、行動に対する重み付けを次のように行う。When the learning section 1 has such a configuration, the behavior is weighted as follows.

【００６５】ロボット装置のセンサ等による検出部２の
検出結果として、「撫でられた」、「叩かれた」等の入
力が発生した場合には、現在実行中の行動をゲート２−
１〜２−ｎの状況により決定し、下記の表に示すよう
に、対応する動作にスコアを対応付けて記憶する。すな
わち、学習した個々の動作に対してスコアを付ける。When an input such as “stroke” or “hit” is detected as a detection result of the detection unit 2 by a sensor or the like of the robot device, the action currently being executed is detected by the gate 2.
It is determined according to the situation of 1-2-n, and as shown in the following table, the corresponding action is associated with a score and stored. That is, a score is assigned to each learned motion.

【００６６】[0066]

【表１】 [Table 1]

【００６７】例えば、出現させる確率を高くする行為、
例えば「撫でられた」の行為がなされた場合には、スコ
アを＋１として、一方、出現させる確率を低くする行
為、例えば「叩かれた」の行為がなされた場合には、ス
コアを−１とする。For example, an act of increasing the probability of appearance,
For example, when the act of “stroke” is performed, the score is set to +1. On the other hand, when the act of lowering the appearance probability, for example, the act of “struck” is performed, the score is set to −1. I do.

【００６８】そして、ロボット装置が次回において学習
した動作を決定する際に、上述の行動の連想を可能とす
る図６に示すようなデータ処理部により、行動の予行演
習（リハーサル）を行う。そして、その中で現れる一連
の動作とされる行動に対して、スコアの和を求める。ロ
ボット装置は、そのようなしてリハーサルによって得た
スコアの和に基づいて、実際に出現させる行動（一連の
動作の結合）を決定するようにする。すなわち例えば、
スコアの和ができるだけ大きくなるように決定すればロ
ボット装置は、従順に行動するようになり、一方、スコ
アの和ができるだけ小さくなるように決定すればロボッ
ト装置は、反抗的に行動するようになる。When the robot device determines the next learned operation, the data processing unit as shown in FIG. 6 that enables the association of the above-mentioned behavior is performed, and a rehearsal of the behavior is performed. Then, a sum of scores is obtained for actions that are a series of actions appearing therein. The robot apparatus determines an action to actually appear (combination of a series of actions) based on the sum of the scores obtained in such a rehearsal. That is, for example,
If the sum of the scores is determined to be as large as possible, the robot device will act obediently, while if the sum of the scores is determined as small as possible, the robot device will act rebelliously. .

【００６９】この例では、低いレベルの動作に対して重
み付けをして、その動作を制御することについて説明し
たが、上述の特開平11-126198号公報に開示されている
ＲＮＮシステムのように多段層の階層構造を採用するこ
ともできるので、さらに上位の階層（一連の行動、行動
ポリシー）等に対する制御もできることはいうまでもな
い。In this example, a description has been given of weighting a low-level operation and controlling the operation. However, as in the RNN system disclosed in Japanese Patent Laid-Open No. Since a hierarchical structure of layers can be adopted, it goes without saying that control can be performed on higher layers (a series of actions and action policies).

【００７０】（３）本実施の形態によるロボット装置の
構成次に、上述したような行動の学習をするロボット装置の
具体的な構成について説明する。(3) Configuration of Robot Apparatus According to the Present Embodiment Next, a specific configuration of the robot apparatus that learns the above-described behavior will be described.

【００７１】図７に示すように、「犬」を模した形状の
いわゆるペットロボットとされ、胴体部ユニット１０２
の前後左右にそれぞれ脚部ユニット１０３Ａ，１０３
Ｂ，１０３Ｃ，１０３Ｄが連結されると共に、胴体部ユ
ニット１０２の前端部及び後端部にそれぞれ頭部ユニッ
ト１０４及び尻尾部ユニット１０５が連結されて構成さ
れている。As shown in FIG. 7, a so-called pet robot imitating a “dog” is formed.
Leg units 103A, 103
B, 103C, and 103D are connected, and a head unit 104 and a tail unit 105 are connected to the front end and the rear end of the body unit 102, respectively.

【００７２】胴体部ユニット１０２には、図８に示すよ
うに、ＣＰＵ（Central ProcessingUnit）１１０、ＤＲ
ＡＭ（Dynamic Random Access Memory）１１１、フラッ
シュＲＯＭ（Read ０nly Memory）１１２、ＰＣ（Perso
nal Computer）カードインターフェース回路１１３及び
信号処理回路１１４が内部バス１１５を介して相互に接
続されることにより形成されたコントロール部１１６
と、このロボット装置１００の動力源としてのバッテリ
１１７とが収納されている。また、胴体部ユニット１０
２には、ロボット装置１００の向きや動きの加速度を検
出するための角速度センサ１１８及び加速度センサ１１
９なども収納されている。As shown in FIG. 8, a CPU (Central Processing Unit) 110 and a DR
AM (Dynamic Random Access Memory) 111, Flash ROM (Read 0nly Memory) 112, PC (Perso
control unit 116 formed by connecting a card interface circuit 113 and a signal processing circuit 114 to each other via an internal bus 115.
And a battery 117 as a power source of the robot device 100 are stored. The body unit 10
2 includes an angular velocity sensor 118 and an acceleration sensor 11 for detecting the acceleration of the direction and movement of the robot apparatus 100.
9 etc. are also stored.

【００７３】また、頭部ユニット１０４には、外部の状
況を撮像するためのＣＣＤ（ChargeCoupled Device）カ
メラ１２０と、使用者からの「撫でる」や「叩く」とい
った物理的な働きかけにより受けた圧力を検出するため
のタッチセンサ１２１と、前方に位置する物体までの距
離を測定するための距離センサ１２２と、外部音を集音
するためのマイクロホン１２３と、鳴き声等の音声を出
力するためのスピーカ１２４と、ロボット装置１００の
「目」に相当するＬＥＤ（Light Emitting Diode）（図
示せず）となどがそれぞれ所定位置に配置されている。The head unit 104 receives a charge coupled device (CCD) camera 120 for capturing an image of an external situation, and receives a pressure applied by a physical action such as “stroke” or “hit” from the user. A touch sensor 121 for detection, a distance sensor 122 for measuring a distance to an object located ahead, a microphone 123 for collecting external sounds, and a speaker 124 for outputting a sound such as a squeal And an LED (Light Emitting Diode) (not shown) corresponding to the “eye” of the robot device 100 are arranged at predetermined positions.

【００７４】さらに、各脚部ユニット１０３Ａ〜１０３
Ｄの関節部分や各脚部ユニット１０３Ａ〜１０３Ｄ及び
胴体部ユニット１０２の各連結部分、頭部ユニット１０
４及び胴体部ユニット１０２の連結部分、並びに尻尾部
ユニット１０５の尻尾１０５Ａの連結部分などにはそれ
ぞれ自由度数分のアクチュエータ１２５_１〜１２５_ｎ及
びポテンショメータ１２６_１〜１２６_ｎが配設されてい
る。例えば、アクチュエータ１２５_１〜１２５_ｎはサー
ボモータを構成として有している。サーボモータの駆動
により、脚部ユニット１０３Ａ〜１０３Ｄが制御され
て、目標の姿勢或いは動作に遷移する。Further, each leg unit 103A-103
D, joint portions of the leg units 103A to 103D and the trunk unit 102, the head unit 10
Actuators 125 _{1 to} 125 _n and potentiometers 126 _{1 to} 126 _n are provided for the number of degrees of freedom, respectively, at a connection portion between the body unit 4 and the body unit 102 and a connection portion at the tail 105 A of the tail unit 105. For example, each of the actuators 125 _{1 to} 125 _n has a servomotor. By driving the servo motor, the leg units 103A to 103D are controlled, and the state shifts to the target posture or operation.

【００７５】そして、これら角速度センサ１１８、加速
度センサ１１９、タッチセンサ１２１、距離センサ１２
２、マイクロホン１２３、スピーカ１２４及び各ポテン
ショメータ１２６_１〜１２６_ｎなどの各種センサ並びに
ＬＥＤ及び各アクチュエータ１２５_１〜１２５_ｎは、
それぞれ対応するハブ１２７_１〜１２７_ｎを介してコン
トロール部１１６の信号処理回路１１４と接続され、Ｃ
ＣＤカメラ１２０及びバッテリ１１７は、それぞれ信号
処理回路１１４と直接接続されている。The angular velocity sensor 118, the acceleration sensor 119, the touch sensor 121, and the distance sensor 12
2. Various sensors such as a microphone 123, a speaker 124, and each of the potentiometers 126 _{1 to} 126 _n , an LED, and each of the actuators 125 _{1 to} 125 _n are:
The hubs 127 ₁ to 127 _n are connected to the signal processing circuit 114 of the control unit 116 via the corresponding hubs 127 ₁ to 127 _n , respectively.
The CD camera 120 and the battery 117 are directly connected to the signal processing circuit 114, respectively.

【００７６】信号処理回路１ｌ４は、上述の各センサか
ら供給されるセンサデータや画像データ及び音声データ
を順次取り込み、これらをそれぞれ内部バス１１５を介
してＤＲＡＭ１１１内の所定位置に順次格納する。また
信号処理回路１１４は、これと共にバッテリ１１７から
供給されるバッテリ残量を表すバッテリ残量データを順
次取り込み、これをＤＲＡＭ１１１内の所定位置に格納
する。The signal processing circuit 114 sequentially takes in the sensor data, image data, and audio data supplied from each of the above-mentioned sensors, and sequentially stores them at predetermined positions in the DRAM 111 via the internal bus 115. In addition, the signal processing circuit 114 sequentially takes in remaining battery power data indicating the remaining battery power supplied from the battery 117 and stores the data in a predetermined position in the DRAM 111.

【００７７】このようにしてＤＲＡＭ１１１に格納され
た各センサデータ、画像データ、音声データ及びバッテ
リ残量データは、この後ＣＰＵ１１０がこのロボット装
置１００の動作制御を行う際に利用される。The sensor data, image data, voice data, and remaining battery data stored in the DRAM 111 in this manner are used when the CPU 110 subsequently controls the operation of the robot apparatus 100.

【００７８】実際上ＣＰＵ１１０は、ロボット装置１０
０の電源が投入された初期時、胴体部ユニット１０２の
図示しないＰＣカードスロットに装填されたメモリカー
ド１２８又はフラッシュＲＯＭ１１２に格納された制御
プログラムをＰＣカードインターフェース回路１１３を
介して又は直接読み出し、これをＤＲＡＭ１１１に格納
する。In practice, the CPU 110 controls the robot device 10
At the initial time when the power supply of the main unit 102 is turned on, the control program stored in the memory card 128 or the flash ROM 112 inserted in the PC card slot (not shown) of the body unit 102 is read out directly or directly through the PC card interface circuit 113. Is stored in the DRAM 111.

【００７９】また、ＣＰＵ１１０は、この後上述のよう
に信号処理回路１１４よりＤＲＡＭ１１１に順次格納さ
れる各センサデータ、画像データ、音声データ及びバッ
テリ残量データに基づいて自己及び周囲の状況や、使用
者からの指示及び働きかけの有無などを判断する。The CPU 110 then determines the status of itself and its surroundings and the usage based on the sensor data, image data, audio data, and remaining battery data sequentially stored in the DRAM 111 from the signal processing circuit 114 as described above. Judge the instruction from the person and the presence or absence of the action.

【００８０】さらに、ＣＰＵ１１０は、この判断結果及
びＤＲＡＭ１１１に格納しだ制御プログラムに基づいて
続く行動を決定すると共に、当該決定結果に基づいて必
要なアクチュエータ１２５_１〜１２５_ｎを駆動させるこ
とにより、頭部ユニット１０４を上下左右に振らせた
り、尻尾部ユニット１０５の尻尾１０５Ａを動かせた
り、各脚部ユニット１０３Ａ〜１０３Ｄを駆動させて歩
行させるなどの行動を行わせる。[0080] Furthermore, CPU 110 is configured to determine a subsequent action based on the control program that is stored in the determination result and DRAM 111, by driving the actuator ₁₂₅ 1 to 125 _n as required based on the determination result, the head Actions such as swinging the unit 104 up and down, left and right, moving the tail 105A of the tail unit 105, and driving and walking each leg unit 103A to 103D are performed.

【００８１】また、この際ＣＰＵ１１０は、必要に応じ
て音声データを生成し、これを信号処理回路１１４を介
して音声信号としてスピーカ１２４に与えることにより
当該音声信号に基づく音声を外部に出力させたり、上述
のＬＥＤを点灯、消灯又は点滅させる。At this time, the CPU 110 generates audio data as necessary and supplies the generated audio data to the speaker 124 as an audio signal via the signal processing circuit 114, thereby outputting an audio based on the audio signal to the outside. The above-mentioned LED is turned on, turned off or blinked.

【００８２】このようにしてこのロボット装置１００に
おいては、自己及び周囲の状況や、使用者からの指示及
び働きかけに応じて自律的に行動し得るようになされて
いる。In this way, the robot device 100 is capable of acting autonomously in accordance with the situation of itself and the surroundings, and instructions and actions from the user.

【００８３】（２）制御プログラムのソフトウェア構成ここで、ロボット装置１００における上述の制御プログ
ラムのソフトウェア構成は、図９に示すようになる。こ
の図９において、デバイス・ドライバ・レイヤ３０は、
この制御プログラムの最下位層に位置し、複数のデバイ
ス・ドライバからなるデバイス・ドライバ・セット１３
１から構成されている。この場合、各デバイス・ドライ
バは、ＣＣＤカメラ１２０（図８）やタイマ等の通常の
コンピュータで用いられるハードウェアに直接アクセス
するごとを許されたオブジェクトであり、対応するハー
ドウェアからの割り込みを受けて処理を行う。(2) Software Configuration of Control Program Here, the software configuration of the above-described control program in the robot device 100 is as shown in FIG. In FIG. 9, the device driver layer 30 includes:
A device driver set 13 located at the lowest layer of the control program and including a plurality of device drivers
1 is comprised. In this case, each device driver is an object permitted to directly access hardware used in a normal computer, such as a CCD camera 120 (FIG. 8) and a timer, and receives an interrupt from the corresponding hardware. Perform processing.

【００８４】また、ロボティック・サーバ・オブジェク
ト１３２は、デバイス・ドライバ・レイヤ１３０の最下
位層に位置し、例えば上述の各種センサやアクチュエー
タ１２５_１〜１２５_ｎ等のハードウェアにアクセスする
ためのインターフェースを提供するソフトウェア群でな
るバーチャル・ロボット１３３と、電源の切換えなどを
管理するソフトウェア群でなるパワーマネージャ１３４
と、他の種々のデバイス・ドライバを管理するソフトウ
ェア群でなるデバイス・ドライバ・マネージャ１３５
と、ロボット装置１００の機構を管理するソフトウェア
群でなるデザインド・ロボット１３６とから構成されて
いる。The robotic server object 132 is located at the lowest layer of the device driver layer 130, and is an interface for accessing hardware such as the various sensors and actuators 125 _{1 to} 125 _n described above. Virtual robot 133, which is a software group that provides power, and a power manager 134, which is a software group that manages switching of power supply and the like.
And a device driver manager 135 which is a software group for managing various other device drivers.
And a designed robot 136 which is a software group for managing the mechanism of the robot apparatus 100.

【００８５】マネージャ・オブジェクト１３７は、オブ
ジェクト・マネージャ１３８及びサービス・マネージャ
１３９から構成されている。オブジェクト・マネージャ
１３８は、ロボティック・サーバ・オブジェクト１３
２、ミドル・ウェア・レイヤ１４０、及びアプリケーシ
ョン・レイヤ１４１に含まれる各ソフトウェア群の起動
や終了を管理するソフトウェア群であり、サービス・マ
ネージャ１３９は、メモリカード１２８（図８）に格納
されたコネクションファイルに記述されている各オブジ
ェクト間の接続情報に基づいて各オブジェクトの接続を
管理するソフトウェア群である。The manager object 137 is composed of an object manager 138 and a service manager 139. The object manager 138 manages the robotic server object 13
2. A software group that manages activation and termination of each software group included in the middleware layer 140 and the application layer 141. The service manager 139 is a software group that stores the connection stored in the memory card 128 (FIG. 8). A group of software that manages the connection of each object based on the connection information between the objects described in the file.

【００８６】ミドル・ウェア・レイヤ１４０は、ロボテ
ィック・サーバ・オブジェクト１３２の上位層に位置
し、画像処理や音声処理などのこのロボット装置１００
の基本的な機能を提供するソフトウェア群から構成され
ている。また、アプリケーション・レイヤ１４１は、ミ
ドル・ウェア・レイヤ１４０の上位層に位置し、当該ミ
ドル・ウェア・レイヤ１４０を構成する各ソフトウェア
群によって処理された処理結果に基づいてロボット装置
１００の行動を決定するためのソフトウェア群から構成
されている。The middleware layer 140 is located on the upper layer of the robotic server object 132.
It consists of a software group that provides the basic functions of. The application layer 141 is located above the middleware layer 140, and determines the behavior of the robot device 100 based on the processing result processed by each software group constituting the middleware layer 140. It consists of a group of software for performing

【００８７】なお、ミドル・ウェア・レイヤ１４０及び
アプリケーション・レイヤ１４１の具体なソフトウェア
構成をそれぞれ図１０に示す。FIG. 10 shows specific software configurations of the middleware layer 140 and the application layer 141, respectively.

【００８８】ミドル・ウェア・レイヤ１４０は、図１０
に示すように、騒音検出用、温度検出用、明るさ検出
用、音階認識用、距離検出用、姿勢検出用、タッチセン
サ用、動き検出用及び色認識用の各信号処理モジュール
１５０〜１５８並びに入力セマンティクスコンバータモ
ジュール１５９などを有する認識系１６０と、出力セマ
ンティクスコンバータモジュール１６８並びに姿勢管理
用、トラッキング用、モーション再生用、歩行用、転倒
復帰用、ＬＥＤ点灯用及び音再生用の各信号処理モジュ
ール１６１〜１６７などを有する出力系６９とから構成
されている。The middleware layer 140 corresponds to FIG.
As shown in, each of the signal processing modules 150 to 158 for noise detection, temperature detection, brightness detection, scale recognition, distance detection, attitude detection, touch sensor, motion detection, and color recognition; A recognition system 160 having an input semantics converter module 159 and the like; an output semantics converter module 168; and signal processing modules 161 for posture management, tracking, motion reproduction, walking, falling back, LED lighting and sound reproduction. And an output system 69 having 167.

【００８９】認識系１６０の各信号処理モジュール１５
０〜１５８は、ロボティック・サーバ・オブジェクト１
３２のバーチャル・ロボット１３３によりＤＲＡＭ１１
１（図８）から読み出される各センサデータや画像デー
タ及び音声データのうちの対応するデータを取り込み、
当該データに基づいて所定の処理を施して、処理結果を
入力セマンティクスコンバータモジュール１５９に与え
る。ここで、例えば、バーチャル・ロボット１３３は、
所定の通信規約によって、信号の授受或いは変換をする
部分として構成されている。Each signal processing module 15 of the recognition system 160
0 to 158 are robotic server objects 1
DRAM 11 by 32 virtual robots 133
1 (FIG. 8), the corresponding data among the sensor data, image data, and audio data read from
A predetermined process is performed based on the data, and a processing result is provided to the input semantics converter module 159. Here, for example, the virtual robot 133
It is configured as a part that exchanges or converts signals according to a predetermined communication protocol.

【００９０】入力セマンティクスコンバータモジュール
１５９は、これら各信号処理モジュール１５０〜１５８
から与えられる処理結果に基づいて、「うるさい」、
「暑い」、「明るい」、「ボールを検出した」、「転倒
を検出した」、「撫でられた」、「叩かれた」、「ドミ
ソの音階が聞こえた」、「動く物体を検出した」又は
「障害物を検出した」などの自己及び周囲の状況や、使
用者からの指令及び働きかけを認識し、認識結果をアプ
リケーション・レイヤ１４１（図８）に出力する。The input semantics converter module 159 is composed of these signal processing modules 150 to 158.
"Noisy" based on the processing result given by
"Hot", "Bright", "Detected ball", "Detected fall", "Stroked", "Slapped", "Heared Domiso scale", "Detected moving object" Alternatively, it recognizes the situation of itself and surroundings such as “detected an obstacle”, and commands and actions from the user, and outputs the recognition result to the application layer 141 (FIG. 8).

【００９１】アプリケーション・レイヤ１４ｌは、図１
１に示すように、行動モデルライブラリ１７０、行動切
換えモジュール１７１、学習モジュール１７２、感情モ
デル１７３及び本能モデル１７４の５つのモジュールか
ら構成されている。The application layer 141 is the one shown in FIG.
As shown in FIG. 1, it is composed of five modules: a behavior model library 170, a behavior switching module 171, a learning module 172, an emotion model 173, and an instinct model 174.

【００９２】行動モデルライブラリ１７０には、図１２
に示すように、「バッテリ残量が少なくなった場合」、
「転倒復帰する」、「障害物を回避する場合」、「感情
を表現する場合」、「ボールを検出した場合」などの予
め選択されたいくつかの条件項目にそれぞれ対応させ
て、それぞれ独立した行動モデル１７０_１〜１７０_ｎが
設けられている。The behavior model library 170 has the contents shown in FIG.
As shown in, "When the battery level is low"
Independently corresponding to several pre-selected condition items such as "return to fall", "when avoiding obstacles", "when expressing emotion", "when ball is detected", etc. Behavior models 170 _{1 to} 170 _n are provided.

【００９３】そして、これら行動モデル１７０_１〜１７
０_ｎは、それぞれ入力セマンティクスコンバータモジュ
ール１５９から認識結果が与えられたときや、最後の認
識結果が与えられてから一定時間が経過したときなど
に、必要に応じて後述のように感情モデル１７３に保持
されている対応する情動のパラメータ値や、本能モデル
１７４に保持されている対応する欲求のパラメータ値を
参照しながら続く行動をそれぞれ決定し、決定結果を行
動切換えモジュール１７１に出力する。The behavior models 170 _{1 to} 170 ₁
0 _n are sent to the emotion model 173 as described later, as necessary, when a recognition result is given from the input semantics converter module 159 or when a certain period of time has passed since the last recognition result was given. The subsequent actions are determined with reference to the parameter values of the corresponding emotions held and the parameter values of the corresponding desires held in the instinct model 174, and the determination result is output to the action switching module 171.

【００９４】なお、この実施の形態の場合、各行動モデ
ル１７０_１〜１７０_ｎは、次の行動を決定する手法とし
て、図１３に示すような１つのノード（状態）ＮＯＤＥ
_０〜ＮＯＤＥ_ｎから他のどのノードＮＯＤＥ_０〜ＮＯＤ
Ｅ_ｎに遷移するかを各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに
間を接続するアークＡＲＣ_１〜ＡＲＣ_ｎ１に対してそれ
ぞれ設定された遷移確率Ｐ_１〜Ｐ_ｎに基づいて確率的に
決定する有限確率オートマトンと呼ばれるアルゴリズム
を用いる。In this embodiment, each of the behavior models 170 _{1 to} 170 _n uses one node (state) NODE as shown in FIG.
_{0 to} NODE _n to any other node NODE _{0 to} NOD
Finite probability automaton for determining probabilistically based on the transition probability _P 1 to P _n which is set respectively arc _ARC 1 _~ARC _n1 connecting between whether a transition to E _n each node NODE ₀ ~NODE _n An algorithm called is used.

【００９５】具体的に、各行動モデル１７０_１〜１７０
_ｎは、それぞれ自己の行動モデル１７０_１〜１７０_ｎを
形成するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにそれぞれ対応
させて、これらノードＮＯＤＥ_０〜ＮＯＤＥ_ｎごとに図
１４に示すような状態遷移表１８０を有している。More specifically, each of the behavior models 170 _{1 to} 170 ₁
_n has a state transition table 180 as shown in FIG. 14 for each of the nodes NODE _{0 to} NODE _n corresponding to the nodes NODE ₀ to NODE _n forming their own behavior models 170 _{1 to} 170 _n , respectively. ing.

【００９６】この状態遷移表１８０では、そのノードＮ
ＯＤＥ_０〜ＮＯＤＥ_ｎにおいて遷移条件とする入力イベ
ント（認識結果）が「入力イベント名」の行に優先順に
列記され、その遷移条件についてのさらなる条件が「デ
ータ名」及び「データ範囲」の行における対応する列に
記述されている。In this state transition table 180, the node N
Input events (recognition results) as transition conditions in ODE _{0 to} NODE _n are listed in order of priority in the row of “input event name”, and further conditions for the transition conditions are described in the rows of “data name” and “data range”. It is described in the corresponding column.

【００９７】したがって、図１４の状態遷移表８０で表
されるノードＮＯＤＥ_１００では、「ボールを検出（Ｂ
ＡＬＬ）」という認識結果が与えられた場合に、当該認
識結果と共に与えられるそのボールの「大きさ（ＳＩＺ
Ｅ）」が「0から1000」の範囲であることや、「障害物
を検出（ＯＢＳＴＡＣＬＥ）」という認識結果が与えら
れた場合に、当該認識結果と共に与えられるその障害物
までの「距離（ＤＩＳＴＡＮＣＥ）」が「0から100」の
範囲であることが他のノードに遷移するための条件とな
っている。Therefore, the node NODE ₁₀₀ represented by the state transition table 80 in FIG.
ALL) ", the size of the ball (SIZ) given together with the recognition result is given.
E) is in the range of “0 to 1000”, or when a recognition result of “obstacle detected (OBSTABLE)” is given, the “distance (DISTANCE)” to the obstacle given together with the recognition result is given. )) Is in the range of “0 to 100”, which is a condition for transitioning to another node.

【００９８】また、このノードＮＯＤＥ_１００では、認
識結果の入力がない場合においても、行動モデル１７０
_１〜１７０_ｎが周期的に参照する感情モデル１７３及び
本能モデル７４にそれぞれ保持された各情動及び各欲求
のパラメータ値のうち、感情モデル７３に保持された
「喜び（ＪＯＹ）」、「驚き（ＳＵＲＰＲＩＳＥ）」若
しくは「悲しみ（ＳＵＤＮＥＳＳ）」のいずれかのパラ
メータ値が「50から100」の範囲であるときには他のノ
ードに遷移することができるようになっている。In the node NODE ₁₀₀ , even when the recognition result is not input, the behavior model 170
₁ to 170 _n is out of the parameter values of the emotions and the desire held respectively in the emotion model 173 and the instinct model 74 refers periodically, held in the emotion model 73 "joy (JOY)", "surprise ( When the parameter value of either “SURPRISE” or “Sadness” is in the range of “50 to 100”, transition to another node can be made.

【００９９】また、状態遷移表１８０では、「他のノー
ドヘの遷移確率」の欄における「遷移先ノード」の列に
そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎから遷移できるノ
ード名が列記されていると共に、「入力イベント名」、
「データ値」及び「データの範囲」の行に記述された全
ての条件が揃ったときに遷移できる他の各ノードＮＯＤ
Ｅ_０〜ＮＯＤＥ_ｎへの遷移確率が「他のノードヘの遷移
確率」の欄内の対応する箇所にそれぞれ記述され、その
ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移する際に出力すべ
き行動が「他のノードヘの遷移確率」の欄における「出
力行動」の行に記述されている。なお、「他のノードヘ
の遷移確率」の欄における各行の確率の和は１００
［％］となっている。In the state transition table 180, the names of nodes that can transition from the nodes NODE ₀ to NODE _n are listed in the column of “transition destination node” in the column of “transition probability to another node”. Input event name ",
Other nodes NOD that can transition when all the conditions described in the rows of “data value” and “data range” are met
The transition probabilities from E _{0 to} NODE _n are respectively described in corresponding portions in the column of “transition probability to another node”, and the action to be output when transitioning to the node NODE _{0 to} NODE _n is “other It is described in the row of “output action” in the column of “transition probability to node”. Note that the sum of the probabilities of each row in the column of “transition probability to another node” is 100
[%].

【０１００】したがって、図１４の状態遷移表１８０で
表されるノードＮＯＤＥ_１００では、例えば「ボールを
検出（ＢＡＬＬ）」し、そのボールの「ＳＩＺＥ（大き
さ）」が「0から1000」の範囲であるという認識結果が
与えられた場合には、「30［％］」の確率で「ノードＮ
ＯＤＥ_１２０（node 120）」に遷移でき、そのとき「Ａ
ＣＴＩＯＮ１」の行動が出力されることとなる。Therefore, in the node NODE ₁₀₀ represented by the state transition table 180 in FIG. 14, for example, “ball is detected (BALL)”, and the “SIZE” of the ball is in the range of “0 to 1000”. Is given, the probability of “30 [%]” and “node N
ODE ₁₂₀ (node 120) "and then" A
The action of “CTION1” is output.

【０１０１】各行動モデル１７０_１〜１７０_ｎは、それ
ぞれこのような状態遷移表１８０として記述されたノー
ドＮＯＤＥ_０〜ＮＯＤＥ_ｎがいくつも繋がるようにし
て構成されており、入力セマンティクスコンバータモジ
ュール１５９から認識結果が与えられたときなどに、対
応するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎの状態遷移表を利
用して確率的に次の行動を決定し、決定結果を行動切換
えモジュール１７１に出力するようになされている。Each of the behavior models 170 _{1 to} 170 _n is formed by connecting a number of nodes NODE ₀ to NODE _n described as such a state transition table 180, and is recognized from the input semantics converter module 159. When a result is given, the next action is determined stochastically using the state transition table of the corresponding nodes NODE ₀ to NODE _n , and the determined result is output to the action switching module 171. .

【０１０２】図１１に示す行動切換えモジュール１７１
は、行動モデルライブラリ１７０の各行動モデル１７０
_１〜１７０_ｎからそれぞれ出力される行動のうち、予め
定められた優先順位の高い行動モデル１７０_１〜１７０
_ｎから出力された行動を選択し、当該行動を実行すべき
旨のコマンド（以下、これを行動コマンドという。）を
ミドル・ウェア・レイヤ１４０の出力セマンティクスコ
ンバータモジュール１６８に送出する。なお、この実施
の形態においては、図１２において下側に表記された行
動モデル１７０_１〜１７０_ｎほど優先順位が高く設定さ
れている。The action switching module 171 shown in FIG.
Is the behavior model 170 of the behavior model library 170
Among the behaviors output from _{1 to} 170 _n, behavior models 170 _{1 to} 170 having a predetermined high priority
_n, and outputs a command to execute the action (hereinafter referred to as an action command) to the output semantics converter module 168 of the middleware layer 140. In this embodiment, the priority order is set higher for the behavior models 170 _{1 to} 170 _n shown on the lower side in FIG.

【０１０３】また、学習した行動を再現する際には、行
動切換えモジュール１７１は、指示された所望の行動を
選択して、その行動を実行すべきコマンドを、出力セマ
ンティクスコンバータモジュール１６８に送出する。こ
の行動切換えモジュール１７１からのコマンドにより、
ロボット装置１００は、学習した行動を出力することが
できるようになる。When reproducing the learned behavior, the behavior switching module 171 selects the specified desired behavior and sends a command to execute the behavior to the output semantics converter module 168. By the command from the action switching module 171,
The robot device 100 can output the learned behavior.

【０１０４】さらに、行動切換えモジュール１７１は、
行動完了後に出力セマンティクスコンバータモジュール
１６８から与えられる行動完了情報に基づいて、その行
動が完了したことを学習モジュール１７２、感情モデル
１７３及び本能モデル１７４に通知する。Furthermore, the action switching module 171
After the action is completed, the learning module 172, the emotion model 173, and the instinct model 174 are notified of the completion of the action based on the action completion information provided from the output semantics converter module 168.

【０１０５】一方、学習モジュール１７２は、入力セマ
ンティクスコンバータモジュール１５９から与えられる
認識結果のうち、「叩かれた」や「撫でられた」など、
使用者からの働きかけとして受けた教示の認識結果を入
力する。On the other hand, the learning module 172 determines whether the recognition result given by the input semantics converter module 159 is “strapped” or “stroked”.
The recognition result of the instruction received as an action from the user is input.

【０１０６】そして、学習モジュール１７２は、この認
識結果及び行動切換えモジュール１７１からの通知に基
づいて、「叩かれた（叱られた）」ときにはその行動の
発現確率を低下させ、「撫でられた（誉められた）」と
きにはその行動の発現確率を上昇させるように、行動モ
デルライブラリ１７０における対応する行動モデル１７
０_１〜１７０_ｎの対応する遷移確率を変更する。Then, based on the recognition result and the notification from the action switching module 171, the learning module 172 lowers the probability of occurrence of the action when “beaten (scorched)” and “strokes ( In some cases, the corresponding behavior model 17 in the behavior model library 170 is increased so as to increase the probability of occurrence of the behavior.
Changing the 0 ₁ to 170 _n corresponding transition probability.

【０１０７】例えば、上述したような学習部１は、実際
のロボット装置１においては、このような学習モジュー
ル１７２において構成され、実現されるものである。For example, the learning section 1 as described above is configured and realized by such a learning module 172 in the actual robot apparatus 1.

【０１０８】他方、感情モデル１７３は、「喜び（jo
y）」、「悲しみ（sadness）」、「怒り（anger）」、
「驚き（surprise）」、「嫌悪（disgust）」及び「恐
れ（fear）」の合計６つの情動について、各情動ごとに
その情動の強さを表すパラメータを保持している。そし
て、感情モデル１７３は、これら各情動のパラメータ値
を、それぞれ入力セマンティクスコンバータモジュール
１５９から与えられる「叩かれた」及び「撫でられた」
などの特定の認識結果と、経過時間及び行動切換えモジ
ュール１７１からの通知となどに基づいて周期的に更新
する。On the other hand, the emotion model 173 indicates “joy (jo
y) "," sadness "," anger ",
For a total of six emotions, “surprise”, “disgust” and “fear”, a parameter indicating the intensity of the emotion is stored for each emotion. Then, the emotion model 173 converts the parameter values of each of these emotions into “strapped” and “stroke” given from the input semantics converter module 159, respectively.
The update is periodically performed based on a specific recognition result such as, for example, an elapsed time and a notification from the action switching module 171.

【０１０９】具体的には、感情モデル１７３は、入力セ
マンティクスコンバータモジュール１５９から与えられ
る認識結果と、そのときのロボット装置１００の行動
と、前回更新してからの経過時間となどに基づいて所定
の演算式により算出されるそのときのその情動の変動量
を△Ｅ［ｔ］、現在のその情動のパラメータ値をＥ
［ｔ］、その情動の感度を表す係数をｋ_ｅとして、
（８）式によって次の周期におけるその情動のパラメー
タ値Ｅ［ｔ＋１］を算出し、これを現在のその情動のパ
ラメータ値Ｅ［ｔ］と置き換えるようにしてその情動の
パラメータ値を更新する。また、感情モデル１７３は、
これと同様にして全ての情動のパラメータ値を更新す
る。Specifically, emotion model 173 is based on a recognition result given from input semantics converter module 159, the behavior of robot device 100 at that time, the elapsed time since the last update, and the like. The variation amount of the emotion at that time calculated by the arithmetic expression is ΔE [t], and the current parameter value of the emotion is E
[T], the coefficient representing the sensitivity of the emotion as _{k e,}
The parameter value E [t + 1] of the emotion in the next cycle is calculated by the equation (8), and the parameter value of the emotion is updated by replacing the parameter value E [t] with the parameter value E [t] of the emotion. The emotion model 173 is
Similarly, the parameter values of all emotions are updated.

【０１１０】[0110]

【数８】 (Equation 8)

【０１１１】なお、各認識結果や出力セマンティクスコ
ンバータモジュール１６８からの通知が各情動のパラメ
ータ値の変動量△Ｅ［ｔ］にどの程度の影響を与えるか
は予め決められており、例えば「叩かれた」といった認
識結果は「怒り」の情動のパラメータ値の変動量△Ｅ
［ｔ］に大きな影響を与え、「撫でられた」といった認
識結果は「喜び」の情動のパラメータ値の変動量△Ｅ
［ｔ］に大きな影響を与えるようになっている。It is determined in advance how much each recognition result and the notification from the output semantics converter module 168 affect the variation ΔE [t] of the parameter value of each emotion. Is the amount of change in the parameter value of the emotion of “anger” △ E
[T] is greatly affected, and the recognition result such as “stroke” is the variation amount of the parameter value of the emotion of “joy” 喜び E
[T] is greatly affected.

【０１１２】ここで、出力セマンティクスコンバータモ
ジュール１６８からの通知とは、いわゆる行動のフィー
ドバック情報（行動完了情報）であり、行動の出現結果
の情報であり、感情モデル１７３は、このような情報に
よっても感情を変化させる。これは、例えば、「吠え
る」といった行動により怒りの感情レベルが下がるとい
ったようなことである。なお、出力セマンティクスコン
バータモジュール１６８からの通知は、上述した学習モ
ジュール１７２にも入力されており、学習モジュール１
７２は、その通知に基づいて行動モデル１７０_１〜１７
０_ｎの対応する遷移確率を変更する。Here, the notification from the output semantics converter module 168 is so-called action feedback information (action completion information), information on the appearance result of the action, and the emotion model 173 also uses such information. Change emotions. This is, for example, a behavior such as "barking" that lowers the emotional level of anger. Note that the notification from the output semantics converter module 168 is also input to the learning module 172 described above, and the learning module 1
72 is an action model 170 _{1 to} 17 based on the notification.
Change the corresponding transition probabilities of 0 _n .

【０１１３】一方、本能モデル１７４は、「運動欲（ex
ercise）」、「愛情欲（affection）」、「食欲（appet
ite）」及び「好奇心（curiosity）」の互いに独立した
４つの欲求について、これら欲求ごとにその欲求の強さ
を表すパラメータを保持している。そして、本能モデル
１７４は、これらの欲求のパラメータ値を、それぞれ入
力セマンティクスコンバータモジュール１５９から与え
られる認識結果や、経過時間及び行動切換えモジュール
１７１からの通知などに基づいて周期的に更新する。On the other hand, the instinct model 174 indicates that “the desire to exercise (ex
ercise), “affection”, “appet”
ite) "and" curiosity ", each of which has a parameter indicating the strength of the desire for each of the four independent desires. Then, the instinct model 174 periodically updates these parameter values of the desire based on the recognition result given from the input semantics converter module 159, the elapsed time, the notification from the action switching module 171 and the like.

【０１１４】具体的には、本能モデル１７４は、「運動
欲」、「愛情欲」及び「好奇心」については、認識結
果、経過時間及び出力セマンティクスコンバータモジュ
ール１６８からの通知などに基づいて所定の演算式によ
り算出されるそのときのその欲求の変動量をΔＩ
［ｋ］、現在のその欲求のパラメータ値をＩ［ｋ］、そ
の欲求の感度を表す係数ｋ_ｉとして、所定周期で（９）
式を用いて次の周期におけるその欲求のパラメータ値Ｉ
［ｋ＋１］を算出し、この演算結果を現在のその欲求の
パラメータ値Ｉ［ｋ］と置き換えるようにしてその欲求
のパラメータ値を更新する。また、本能モデル１７４
は、これと同様にして「食欲」を除く各欲求のパラメー
タ値を更新する。More specifically, the instinct model 174 determines, based on the recognition result, the elapsed time, the notification from the output semantics converter module 168, and the like, for “exercise desire”, “affection desire”, and “curiosity”. The change amount of the desire at that time calculated by the arithmetic expression is ΔI
[K], the current parameter value of the desire I [k], as the coefficient k _i which represents the sensitivity of the desire, in a predetermined cycle (9)
Using the equation, the parameter value I of the desire in the next cycle
[K + 1] is calculated, and the calculation result is replaced with the current parameter value I [k] of the desire to update the parameter value of the desire. Instinct model 174
Updates the parameter values of each desire except “appetite” in the same manner.

【０１１５】[0115]

【数９】 (Equation 9)

【０１１６】なお、認識結果及び出力セマンティクスコ
ンバータモジュール１６８からの通知などが各欲求のパ
ラメータ値の変動量△Ｉ［ｋ］にどの程度の影響を与え
るかは予め決められており、例えば出力セマンティクス
コンバータモジュール１６８からの通知は、「疲れ」の
パラメータ値の変動量△Ｉ［ｋ］に大きな影響を与える
ようになっている。Note that the degree to which the recognition result and the notification from the output semantics converter module 168 affect the variation ΔI [k] of the parameter value of each desire is determined in advance. For example, the output semantics converter The notification from the module 168 has a large influence on the variation ΔI [k] of the parameter value of “fatigue”.

【０１１７】なお、本実施の形態においては、各情動及
び各欲求（本能）のパラメータ値がそれぞれ0から100ま
での範囲で変動するように規制されており、また係数ｋ
_ｅ、ｋ_ｉの値も各情動及び各欲求ごとに個別に設定され
ている。In the present embodiment, the parameter values of each emotion and each desire (instinct) are regulated to fluctuate in the range of 0 to 100, and the coefficient k
_e, the value of k _i is also set individually for each emotion and each desire.

【０１１８】一方、ミドル・ウェア・レイヤ４０の出力
セマンティクスコンバータモジュール１６８は、図１０
に示すように、上述のようにしてアプリケーション・レ
イヤ１４１の行動切換えモジュール１７１から与えられ
る「前進」、「喜ぶ」、「鳴く」又は「トラッキング
（ボールを追いかける）」といった抽象的な行動コマン
ドを出力系１６９の対応する信号処理モジュール１６１
〜１６７に与える。On the other hand, the output semantics converter module 168 of the middleware layer 40
As shown in the above, an abstract action command such as "forward", "pleasure", "scream" or "tracking (chasing the ball)" provided from the action switching module 171 of the application layer 141 is output as described above. Corresponding signal processing module 161 of system 169
~ 167.

【０１１９】そしてこれら信号処理モジュール１６１〜
１６７は、行動コマンドが与えられると当該行動コマン
ドに基づいて、その行動を行うために対応するアクチュ
エータ１２５_１〜１２５_ｎ（図８）に与えるべきサーボ
指令値や、スピーカ１２４（図８）から出力する音の音
声データ及び又は「目」のＬＥＤに与える駆動データを
生成し、これらのデータをロボティック・サーバ・オブ
ジェクト１３２のバーチャル・ロボット１３３及び信号
処理回路１１４（図８）を順次介して対応するアクチュ
エータ１２５_１〜１２５_ｎ又はスピーカ１２４又はＬＥ
Ｄに順次送出する。The signal processing modules 161 to 161
167, given the behavior command based on the action command, and servo command value to be supplied to the actuator ₁₂₅ 1 to 125 _n (FIG. 8) corresponding to perform that action, the output from the speaker 124 (FIG. 8) The audio data of the sound to be played and / or the driving data to be given to the LED of the "eye" are generated, and these data are sequentially processed through the virtual robot 133 of the robotic server object 132 and the signal processing circuit 114 (FIG. 8). Actuator 125 _{1 to} 125 _n or speaker 124 or LE
D.

【０１２０】このようにしてロボット装置１００におい
ては、制御プログラムに基づいて、自己（内部）及び周
囲（外部）の状況や、使用者からの指示及び働きかけに
応じた自律的な行動を行うことができるようになされて
いる。In this way, the robot apparatus 100 can perform autonomous actions according to its own (internal) and surrounding (external) conditions and instructions and actions from the user based on the control program. It has been made possible.

【０１２１】以上のようなロボット装置１００は、新た
な行動を学習することができるようになり、さらに、学
習した行動に重み付けをすることができるようになる。
これにより、ロボット装置１００は、重み付けに応じ
て、学習した行動の制御をすることができるようにな
る。The robot apparatus 100 as described above can learn a new action, and can weight the learned action.
Thereby, the robot device 100 can control the learned behavior according to the weighting.

【０１２２】なお、上述の実施の形態では、行動の学習
を、ＲＮＮによる学習、或いはＲＮＮを用いた文節化に
よる行動の学習等について説明した。しかし、これに限
定されるものではなく、他の学習手段により行動を学習
することができることはいうまでもない。この場合、図
１に示すような検出部２、評価部３及び対応付け部４を
学習手段に応じて構成するようにする。In the above-described embodiment, the learning of the behavior has been described as the learning by the RNN, the learning of the behavior by the segmentation using the RNN, and the like. However, the present invention is not limited to this, and it goes without saying that the behavior can be learned by other learning means. In this case, the detecting unit 2, the evaluating unit 3, and the associating unit 4 as shown in FIG. 1 are configured according to the learning means.

【０１２３】[0123]

【発明の効果】本発明に係るロボット装置は、外部から
の外部入力信号を検出する入力信号検出手段と、入力信
号検出手段により検出された外部入力信号を評価する評
価手段と、評価手段による評価結果を行動内容情報に対
応付けする対応付け手段と、対応付け手段により対応付
けされた評価に基づいて、行動内容情報に基づいて行動
の制御を行う行動制御手段とを備えることにより、入力
信号検出手段により検出された外部入力信号を評価手段
により評価し、評価手段による評価結果を行動内容情報
に対応付け手段により対応付けをし、対応付け手段によ
り対応付けされた評価に基づいて、行動内容情報に基づ
いて行動制御手段により行動の制御をすることができ
る。これにより、ロボット装置は、学習した行動を評価
して、その評価に基づいて行動を出現させることができ
る。The robot apparatus according to the present invention has an input signal detecting means for detecting an external input signal from the outside, an evaluation means for evaluating the external input signal detected by the input signal detecting means, and an evaluation by the evaluation means. An input signal detection unit that includes an association unit that associates the result with the activity content information; and an activity control unit that controls the activity based on the activity content information based on the evaluation associated with the association device. The external input signal detected by the means is evaluated by the evaluation means, the evaluation result by the evaluation means is associated with the action content information by the association means, and the action content information is determined based on the evaluation associated by the association means. The behavior can be controlled by the behavior control means based on the. Accordingly, the robot device can evaluate the learned behavior and cause the behavior to appear based on the evaluation.

【０１２４】また、本発明に係るロボット装置の行動制
御方法は、外部からの外部入力信号をロボット装置が検
出する入力信号検出工程と、入力信号検出工程にて検出
された外部入力信号をロボット装置が評価する評価工程
と、評価工程にて得た評価結果を行動内容情報に応付け
する対応付け工程と、対応付け工程にて対応付けされた
評価に基づいて、行動内容情報に基づいてロボット装置
が行動の制御を行う行動制御工程とを有することによ
り、このようなロボット装置の行動制御方法により行動
の制御がなされるロボット装置は、学習した行動を評価
して、その評価に基づいて行動を出現させることができ
る。Further, in the behavior control method for a robot device according to the present invention, the robot device detects an external input signal from the outside, and outputs the external input signal detected in the input signal detection process to the robot device. A robot apparatus based on the action content information based on the evaluation step evaluated by the user, the associating step of assigning the evaluation result obtained in the evaluation step to the action content information, and the evaluation associated in the associating step. Having a behavior control step of controlling the behavior, the robot device whose behavior is controlled by such a behavior control method of the robot device evaluates the learned behavior, and performs the behavior based on the evaluation. Can appear.

【０１２５】また、本発明に係るプログラムは、外部か
らの外部入力信号を検出する入力信号検出工程と、入力
信号検出工程にて検出された外部入力信号を評価する評
価工程と、評価工程にて得た評価結果を行動内容情報に
対応付けする対応付け工程と、対応付け工程にて対応付
けされた評価に基づいて、行動内容情報に基づいて行動
の制御を行う行動制御工程とをロボット装置に実行させ
ることにより、このようなプログラムにより行動の制御
が実行されるロボット装置は、学習した行動を評価し
て、その評価に基づいて行動を出現させることができ
る。The program according to the present invention includes an input signal detecting step for detecting an external input signal from the outside, an evaluating step for evaluating the external input signal detected in the input signal detecting step, and an evaluating step. An associating step of associating the obtained evaluation result with the action content information, and an action control step of controlling an action based on the action content information based on the evaluation associated in the associating step, to the robot apparatus. By executing the program, the robot apparatus in which the control of the action is executed by such a program can evaluate the learned action and cause the action to appear based on the evaluation.

【０１２６】また、本発明に係る記録媒体は、外部から
の外部入力信号を検出する入力信号検出工程と、入力信
号検出工程にて検出された外部入力信号を評価する評価
工程と、評価工程にて得た評価結果を行動内容情報に対
応付けする対応付け工程と、対応付け工程にて対応付け
された評価に基づいて、行動内容情報に基づいて行動の
制御を行う行動制御工程とをロボット装置に実行させる
プログラムが記録されており、このような記録媒体に記
録されているプログラムにより行動の制御が実行される
ロボット装置は、学習した行動を評価して、その評価に
基づいて行動を出現させることができる。Further, the recording medium according to the present invention includes an input signal detecting step for detecting an external input signal from the outside, an evaluation step for evaluating the external input signal detected in the input signal detecting step, and an evaluation step. Robot apparatus comprising: an associating step of associating the evaluation result obtained with the action content information; and an action control step of controlling an action based on the action content information based on the evaluation associated in the associating step. The robot device in which the program to be executed is recorded, and the control of the behavior is executed by the program recorded in such a recording medium, evaluates the learned behavior and causes the behavior to appear based on the evaluation. be able to.

[Brief description of the drawings]

【図１】実施の形態のロボット装置における発明を実現
する要部を示すブロック図である。FIG. 1 is a block diagram showing a main part for realizing the invention in a robot device according to an embodiment.

【図２】上述の学習部の具体的な構成であって、複数の
ＲＮＮによって階層的に構成されているものを示す図で
ある。FIG. 2 is a diagram showing a specific configuration of the above-described learning unit, which is hierarchically configured by a plurality of RNNs.

【図３】上述の階層構造として構成されている学習部の
下位層のＲＮＮの構成を示す図である。FIG. 3 is a diagram illustrating a configuration of an RNN in a lower layer of a learning unit configured as the above-described hierarchical structure.

【図４】上述の階層構造として構成されている学習部の
上位層のＲＮＮの構成を示す図である。FIG. 4 is a diagram illustrating a configuration of an RNN in an upper layer of a learning unit configured as the above-described hierarchical structure.

【図５】上述の学習部による行動学習を説明するために
使用した図である。FIG. 5 is a diagram used to explain behavior learning by the learning unit described above.

【図６】動作にリハーサルを実現するデータ処理部にお
ける構成を示す図である。FIG. 6 is a diagram illustrating a configuration of a data processing unit that implements rehearsal for operation.

【図７】実施の形態のロボット装置の外観構成を示す斜
視図である。FIG. 7 is a perspective view illustrating an external configuration of the robot device according to the embodiment.

【図８】上述のロボット装置の回路構成を示すブロック
図である。FIG. 8 is a block diagram showing a circuit configuration of the robot device described above.

【図９】上述のロボット装置のソフトウェア構成を示す
ブロック図である。FIG. 9 is a block diagram showing a software configuration of the robot device described above.

【図１０】上述のロボット装置のソフトウェア構成にお
けるミドル・ウェア・レイヤの構成を示すブロック図で
ある。FIG. 10 is a block diagram showing a configuration of a middleware layer in a software configuration of the robot device described above.

【図１１】上述のロボット装置のソフトウェア構成にお
けるアプリケーション・レイヤの構成を示すブロック図
である。FIG. 11 is a block diagram showing a configuration of an application layer in the software configuration of the robot device described above.

【図１２】上述のアプリケーション・レイヤの行動モデ
ルライブラリの構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of an action model library of the application layer.

【図１３】ロボット装置の行動決定のための情報となる
有限確率オートマトンを説明するために使用した図であ
る。FIG. 13 is a diagram used to explain a finite probability automaton that is information for determining an action of a robot device.

【図１４】有限確率オートマトンの各ノードに用意され
た状態遷移表を示す図である。FIG. 14 is a diagram showing a state transition table prepared for each node of the finite probability automaton.

[Explanation of symbols]

１学習部、２検出部、３評価部、４対応付け部、
１００ロボット装置1 learning unit, 2 detection unit, 3 evaluation unit, 4 association unit,
100 robot device

Claims

[Claims]

1. An input signal detecting means for detecting an external input signal from outside, an evaluating means for evaluating an external input signal detected by the input signal detecting means, and an evaluation result by the evaluating means as action content information. A robot device comprising: a matching unit for making a correspondence; and an action control unit that controls a behavior based on the behavior content information based on the evaluation associated with the correspondence unit.

2. The robot apparatus according to claim 1, further comprising learning means for learning a new action, wherein the learning means makes the new action correspond to the action content information by learning.

3. The method according to claim 1, further comprising learning means for learning a new action, wherein the learning means segments and learns the time-series data to be the action to be learned. Robotic device.

4. The robot apparatus according to claim 1, wherein the evaluation result is a probability or a weight for causing an action to appear.

5. The robot apparatus according to claim 1, wherein the association unit is a storage unit that stores the action content information and the evaluation result as a pair.

6. An internal state is changed according to input information,
The robot device according to claim 1, wherein the robot device acts based on an internal state.

7. An input signal detecting step in which the robot apparatus detects an external input signal from the outside; an evaluation step in which the robot apparatus evaluates the external input signal detected in the input signal detecting step; In the associating step of associating the evaluation result obtained in the above evaluating step with the action content information, based on the evaluation associated in the associating step,
A behavior control step of controlling the behavior of the robot apparatus based on the behavior content information.

8. An input signal detection step of detecting an external input signal from the outside, an evaluation step of evaluating the external input signal detected in the input signal detection step, and an evaluation result obtained in the evaluation step. Based on the associating step of associating with the action content information,
A program for causing a robot apparatus to execute a behavior control step of controlling behavior based on the behavior content information.

9. An input signal detecting step of detecting an external input signal from outside, an evaluating step of evaluating the external input signal detected in the input signal detecting step, and an evaluation result obtained in the evaluating step. Based on the associating step of associating with the action content information and the evaluation associated in the associating step,
A recording medium on which a program for causing a robot apparatus to execute an action control step of controlling an action based on the action content information is recorded.