JP2013152656A

JP2013152656A - Information processor, information processing method and program for determining explanatory variable

Info

Publication number: JP2013152656A
Application number: JP2012013698A
Authority: JP
Inventors: Hiroharu Maruhashi; 弘治丸橋; Nobuhiro Yugami; 伸弘湯上
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-01-26
Filing date: 2012-01-26
Publication date: 2013-08-08
Anticipated expiration: 2032-01-26
Also published as: JP5794160B2

Abstract

【課題】予測モデルに追加すべき説明変数を見つける際に行う計算の量を削減する。
【解決手段】本情報処理装置は、複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出する第１算出部と、算出された誤差に基づき、複数の目的変数を複数のグループに分類する分類部と、複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出する第２算出部と、複数のグループの各々について、代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と代表値との差に基づき、当該グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する決定部とを有する。
【選択図】図１An object of the present invention is to reduce the amount of calculation performed when finding explanatory variables to be added to a prediction model.
The information processing apparatus calculates, for each of a plurality of objective variables, an error between an actual value of the objective variable and a value calculated by a first prediction model for predicting the value of the objective variable. A first calculation unit to be calculated, a classification unit for classifying a plurality of objective variables into a plurality of groups based on the calculated error, and an error calculated for the objective variable belonging to the group for each of the plurality of groups A second calculation unit for calculating a representative value of the error and a plurality of second prediction models for predicting the representative value for each of the plurality of groups while changing the explanatory variables. A determining unit that determines an explanatory variable to be added to the first prediction model of the objective variable belonging to the group based on a difference between each of the values calculated by the two prediction models and the representative value.
[Selection] Figure 1

Description

本技術は、予測モデルの構築技術に関する。 The present technology relates to a prediction model construction technology.

時間の経過に従って変化する目的変数の値（例えば株価）を説明変数の値（例えば過去の株価）を利用して予測するための予測モデルを構築する技術がある。 There is a technique for constructing a prediction model for predicting a value (for example, stock price) of an objective variable that changes with the passage of time by using a value (for example, past stock price) of an explanatory variable.

精度が高い予測モデルを構築するためには、予測モデルに説明変数を追加し、より多くの説明変数の値を利用して目的変数の値を予測することが有効であることが知られている。例えば、Ａ運輸の株価について予測モデルを構築する場合、Ａ運輸の過去の株価だけでなくガソリンの過去の価格を利用すると、予測モデルの精度が向上することがある。 It is known that in order to build a prediction model with high accuracy, it is effective to add explanatory variables to the prediction model and predict the value of the objective variable using more explanatory variable values. . For example, when a prediction model is constructed for the stock price of A transportation, the accuracy of the prediction model may be improved by using not only the past stock price of A transportation but also the past price of gasoline.

但し、予測モデルに非常に多くの説明変数を追加して精度を高めようとすると、予測モデルの構築に利用する時系列データ（以下、学習データと呼ぶ）に特化した予測モデルになってしまう。そのため、その予測モデルの精度の検証を他の時系列データを用いて行うと、かえって精度が低下していることがある。このような状態は、機械学習の分野では「オーバーフィッティング」と呼ばれている。 However, if a large number of explanatory variables are added to the prediction model to improve accuracy, the prediction model will be specialized for time-series data (hereinafter referred to as learning data) used to construct the prediction model. . Therefore, if the accuracy of the prediction model is verified using other time series data, the accuracy may be lowered. Such a state is called “overfitting” in the field of machine learning.

予測モデルの精度が最も高くなるような説明変数を選定することは非常に難しい。単純には、説明変数のあらゆる組合せについて予測モデルを構築し、検証を行い、最も精度が高い予測モデルを採用することが考えられる。但し、説明変数の数が多くなると組合せ爆発によって計算量が膨大になるため、現実的ではない。 It is very difficult to select an explanatory variable that gives the highest accuracy of the prediction model. Simply, it is conceivable that a prediction model is constructed and verified for every combination of explanatory variables, and the prediction model with the highest accuracy is adopted. However, if the number of explanatory variables increases, the amount of calculation becomes enormous due to the combined explosion, which is not realistic.

そこで、従来では、有用性が最も高い説明変数を予測モデルに１つ追加する処理及び説明変数を新たに追加することによって不要になった説明変数を除去する処理を繰り返す技術（ステップワイズ変数選択法）が利用されている。説明変数の有用性を表す指標には、例えば、重回帰モデルとしての有意性のためのＦ値（F value）、ＡＩＣ（Akaike's Information Criterion）及びＢＩＣ（Bayesian Information Criterion）といったものがある。近年では、ＡＩＣ及びＢＩＣといった指標が利用されることが多い。両者は、誤差を二乗した値の総和が最も小さくなるような説明変数を採用するものである。 Therefore, conventionally, a technique of repeating the process of adding one explanatory variable having the highest usefulness to the prediction model and the process of removing an explanatory variable that has become unnecessary by newly adding an explanatory variable (stepwise variable selection method) ) Is used. Examples of indices representing the usefulness of explanatory variables include F value (F value) for significance as a multiple regression model, AIC (Akaike's Information Criterion), and BIC (Bayesian Information Criterion). In recent years, indexes such as AIC and BIC are often used. Both employ an explanatory variable that minimizes the sum of the squared errors.

しかし、上記の技術にも問題がある。この技術では、総合の計算量はおおよそ（１の説明変数について有用さを表す指標を算出するのに要する計算量）×（目的変数の数）×（説明変数の候補の数）となる。そのため、目的変数の数及び説明変数の数が多くなると、計算量が非常に多くなるという問題がある。 However, there is a problem with the above technique. In this technique, the total calculation amount is approximately (calculation amount required to calculate an index representing the usefulness of one explanatory variable) × (number of objective variables) × (number of explanatory variable candidates). Therefore, when the number of objective variables and the number of explanatory variables increase, there is a problem that the amount of calculation becomes very large.

Paul A. Murtaugh (2009). Performance of several variable-selection methods applied to real ecological data. Ecology Letters, 12:1061-1068.Paul A. Murtaugh (2009). Performance of several variable-selection methods applied to real ecological data. Ecology Letters, 12: 1061-1068.

従って、本技術の目的は、一側面では、予測モデルに追加すべき説明変数を見つける際に行う計算の量を削減するための技術を提供することである。 Accordingly, an object of the present technology is, in one aspect, to provide a technique for reducing the amount of calculation performed when finding an explanatory variable to be added to a prediction model.

本技術の一態様に係る情報処理装置は、（Ａ）記憶装置と、（Ｂ）複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、記憶装置に格納する第１算出部と、（Ｃ）記憶装置に格納されている誤差に基づき複数の目的変数を複数のグループに分類し、当該複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、記憶装置に格納する第２算出部と、（Ｄ）複数のグループの各々について、記憶装置に格納されている代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と代表値との差に基づき、当該グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する第１決定部とを有する。 An information processing apparatus according to an aspect of the present technology includes: (A) a storage device; and (B) a first value for predicting an actual value of an objective variable and a value of the objective variable for each of a plurality of objective variables. And (C) classifying a plurality of objective variables into a plurality of groups based on the error stored in the storage device. For each of the plurality of groups, a second calculation unit that calculates a representative value of the error using an error calculated for the objective variable belonging to the group and stores the representative value in a storage device; and (D) a plurality of groups For each, a plurality of second prediction models for predicting representative values stored in the storage device are generated while changing the explanatory variables, and each of the values calculated by the generated second prediction models Difference from typical value Based, and a first determining unit for determining the explanatory variable to be added to the first prediction model target variable belonging to the group.

予測モデルに追加すべき説明変数を見つける際に行う計算の量を削減できるようになる。 This will reduce the amount of computation to be performed when finding explanatory variables to be added to the prediction model.

図１は、本実施の形態に係る情報処理装置の機能ブロック図である。FIG. 1 is a functional block diagram of the information processing apparatus according to the present embodiment. 図２は、グルーピング処理部の機能ブロック図である。FIG. 2 is a functional block diagram of the grouping processing unit. 図３は、候補抽出部の機能ブロック図である。FIG. 3 is a functional block diagram of the candidate extraction unit. 図４は、決定部の機能ブロック図である。FIG. 4 is a functional block diagram of the determination unit. 図５は、学習データ格納部に格納されている学習データの一例を示す図である。FIG. 5 is a diagram illustrating an example of learning data stored in the learning data storage unit. 図６は、メインの処理フローを示す図である。FIG. 6 is a diagram showing a main processing flow. 図７は、グルーピング処理の概要を説明するための図である。FIG. 7 is a diagram for explaining the outline of the grouping process. 図８は、グルーピング処理の概要を説明するための図である。FIG. 8 is a diagram for explaining an outline of the grouping process. 図９は、グルーピング処理の処理フローを示す図である。FIG. 9 is a diagram illustrating a processing flow of the grouping process. 図１０は、第１モデル格納部に格納されているデータの一例を示す図である。FIG. 10 is a diagram illustrating an example of data stored in the first model storage unit. 図１１は、第１誤差データ格納部に格納されているデータの一例を示す図である。FIG. 11 is a diagram illustrating an example of data stored in the first error data storage unit. 図１２は、グルーピング結果格納部に格納されているデータの一例を示す図である。FIG. 12 is a diagram illustrating an example of data stored in the grouping result storage unit. 図１３は、第１候補抽出処理の概要を説明するための図である。FIG. 13 is a diagram for explaining an overview of the first candidate extraction process. 図１４は、第１候補抽出処理の処理フローを示す図である。FIG. 14 is a diagram illustrating a processing flow of the first candidate extraction processing. 図１５は、第２モデル格納部に格納されているデータの一例を示す図である。FIG. 15 is a diagram illustrating an example of data stored in the second model storage unit. 図１６は、第２誤差データ格納部に格納されているデータの一例を示す図である。FIG. 16 is a diagram illustrating an example of data stored in the second error data storage unit. 図１７は、第１評価値格納部に格納されているデータの一例を示す図である。FIG. 17 is a diagram illustrating an example of data stored in the first evaluation value storage unit. 図１８は、第１抽出結果格納部に格納されているデータの一例を示す図である。FIG. 18 is a diagram illustrating an example of data stored in the first extraction result storage unit. 図１９は、第２候補抽出処理の概要を説明するための図である。FIG. 19 is a diagram for explaining the outline of the second candidate extraction process. 図２０は、第２候補抽出処理の処理フローを示す図である。FIG. 20 is a diagram illustrating a process flow of the second candidate extraction process. 図２１は、第３モデル格納部に格納されているデータの一例を示す図である。FIG. 21 is a diagram illustrating an example of data stored in the third model storage unit. 図２２は、第３誤差データ格納部に格納されているデータの一例を示す図である。FIG. 22 is a diagram illustrating an example of data stored in the third error data storage unit. 図２３は、第２評価値格納部に格納されているデータの一例を示す図である。FIG. 23 is a diagram illustrating an example of data stored in the second evaluation value storage unit. 図２４は、第２抽出結果格納部に格納されているデータの一例を示す図である。FIG. 24 is a diagram illustrating an example of data stored in the second extraction result storage unit. 図２５は、候補格納部に格納されているデータの一例を示す図である。FIG. 25 is a diagram illustrating an example of data stored in the candidate storage unit. 図２６は、決定処理の概要を説明するための図である。FIG. 26 is a diagram for explaining the outline of the determination process. 図２７は、決定処理の処理フローを示す図である。FIG. 27 is a diagram illustrating a processing flow of determination processing. 図２８は、第４モデル格納部に格納されているデータの一例を示す図である。FIG. 28 is a diagram illustrating an example of data stored in the fourth model storage unit. 図２９は、第４誤差データ格納部に格納されているデータの一例を示す図である。FIG. 29 is a diagram illustrating an example of data stored in the fourth error data storage unit. 図３０は、第３評価値格納部に格納されているデータの一例を示す図である。FIG. 30 is a diagram illustrating an example of data stored in the third evaluation value storage unit. 図３１は、決定結果格納部に格納されているデータの一例を示す図である。FIG. 31 is a diagram illustrating an example of data stored in the determination result storage unit. 図３２は、表示する画面のデータの一例を示す図である。FIG. 32 is a diagram illustrating an example of screen data to be displayed. 図３３は、予測モデルに追加する説明変数を銘柄毎に決定する処理の処理フローを示す図である。FIG. 33 is a diagram illustrating a processing flow of processing for determining an explanatory variable to be added to the prediction model for each brand. 図３４は、コンピュータの機能ブロックを示す図である。FIG. 34 is a diagram illustrating functional blocks of a computer.

図１に、本実施の形態に係る情報処理装置１の機能ブロック図を示す。情報処理装置１は、グルーピング処理部２と、グルーピング結果格納部３と、候補抽出部４と、学習データ格納部５と、候補格納部６と、決定部７と、決定結果格納部８と、出力部９とを含む。 FIG. 1 shows a functional block diagram of the information processing apparatus 1 according to the present embodiment. The information processing apparatus 1 includes a grouping processing unit 2, a grouping result storage unit 3, a candidate extraction unit 4, a learning data storage unit 5, a candidate storage unit 6, a determination unit 7, a determination result storage unit 8, And an output unit 9.

グルーピング処理部２は、学習データ格納部５に格納されているデータを用いてグルーピング処理を行い、処理結果をグルーピング結果格納部３に格納する。候補抽出部４は、グルーピング結果格納部３に格納されているデータ及び学習データ格納部５に格納されているデータを用いて第１及び第２候補抽出処理を行い、処理結果を候補格納部６に格納する。決定部７は、グルーピング結果格納部３、学習データ格納部５及び候補格納部６に格納されているデータを用いて決定処理を行い、処理結果を決定結果格納部８に格納する。出力部９は、グルーピング結果格納部３に格納されているデータ、候補格納部６に格納されているデータ及び決定結果格納部８に格納されているデータを用いて表示する画面のデータを生成し、表示装置等に表示させる。 The grouping processing unit 2 performs grouping processing using the data stored in the learning data storage unit 5 and stores the processing result in the grouping result storage unit 3. The candidate extraction unit 4 performs first and second candidate extraction processing using the data stored in the grouping result storage unit 3 and the data stored in the learning data storage unit 5, and the processing result is stored in the candidate storage unit 6. To store. The determination unit 7 performs a determination process using data stored in the grouping result storage unit 3, the learning data storage unit 5, and the candidate storage unit 6, and stores the processing result in the determination result storage unit 8. The output unit 9 generates screen data to be displayed using the data stored in the grouping result storage unit 3, the data stored in the candidate storage unit 6, and the data stored in the determination result storage unit 8. And display it on a display device or the like.

図２に、グルーピング処理部２の機能ブロック図を示す。グルーピング処理部２は、第１モデル生成部２１と、第１モデル格納部２２と、第１誤差算出部２３と、第１誤差データ格納部２４と、グループ生成部２５とを含む。 FIG. 2 shows a functional block diagram of the grouping processing unit 2. The grouping processing unit 2 includes a first model generation unit 21, a first model storage unit 22, a first error calculation unit 23, a first error data storage unit 24, and a group generation unit 25.

第１モデル生成部２１は、学習データ格納部５に格納されているデータを用いて処理を行い、処理結果を第１モデル格納部２２に格納する。第１誤差算出部２３は、学習データ格納部５に格納されているデータ及び第１モデル格納部２２に格納されているデータを用いて処理を行い、処理結果を第１誤差データ格納部２４に格納する。グループ生成部２５は、第１誤差データ格納部２４に格納されているデータを用いて処理を行い、処理結果をグルーピング結果格納部３に格納する。 The first model generation unit 21 performs processing using the data stored in the learning data storage unit 5 and stores the processing result in the first model storage unit 22. The first error calculation unit 23 performs processing using the data stored in the learning data storage unit 5 and the data stored in the first model storage unit 22, and the processing result is stored in the first error data storage unit 24. Store. The group generation unit 25 performs processing using the data stored in the first error data storage unit 24 and stores the processing result in the grouping result storage unit 3.

図３に、候補抽出部４の機能ブロック図を示す。候補抽出部４は、第２モデル生成部４０１と、第２モデル格納部４０２と、第３モデル格納部４０３と、第２誤差算出部４０４と、第２誤差データ格納部４０５と、第３誤差データ格納部４０６と、第１評価値算出部４０７と、第１評価値格納部４０８と、第２評価値格納部４０９と、抽出部４１０と、第１抽出結果格納部４１１と、第２抽出結果格納部４１２と、第１特定部４１３とを含む。 FIG. 3 shows a functional block diagram of the candidate extraction unit 4. The candidate extraction unit 4 includes a second model generation unit 401, a second model storage unit 402, a third model storage unit 403, a second error calculation unit 404, a second error data storage unit 405, and a third error. Data storage unit 406, first evaluation value calculation unit 407, first evaluation value storage unit 408, second evaluation value storage unit 409, extraction unit 410, first extraction result storage unit 411, and second extraction A result storage unit 412 and a first specifying unit 413 are included.

第２モデル生成部４０１は、グルーピング結果格納部３に格納されているデータ及び学習データ格納部５に格納されているデータを用いて処理を行い、処理結果を第２モデル格納部４０２及び第３モデル格納部４０３に格納する。第２誤差算出部４０４は、第２モデル格納部４０２に格納されているデータを用いて処理を行い、処理結果を第２誤差データ格納部４０５に格納する。また、第２誤差算出部４０４は、第３モデル格納部４０３に格納されているデータを用いて処理を行い、処理結果を第３誤差データ格納部４０６に格納する。第１評価値算出部４０７は、第２誤差データ格納部４０５に格納されているデータを用いて処理を行い、処理結果を第１評価値格納部４０８に格納する。また、第１評価値算出部４０７は、第３誤差データ格納部４０６に格納されているデータを用いて処理を行い、処理結果を第２評価値格納部４０９に格納する。抽出部４１０は、第１評価値格納部４０８に格納されているデータを用いて処理を行い、処理結果を第１抽出結果格納部４１１に格納する。また、抽出部４１０は、第２評価値格納部４０９に格納されているデータを用いて処理を行い、処理結果を第２抽出結果格納部４１２に格納する。第１特定部４１３は、第１抽出結果格納部４１１に格納されているデータ及び第２抽出結果格納部４１２に格納されているデータを用いて処理を行い、処理結果を候補格納部６に格納する。 The second model generation unit 401 performs processing using the data stored in the grouping result storage unit 3 and the data stored in the learning data storage unit 5, and the processing result is stored in the second model storage unit 402 and the third model. Store in the model storage unit 403. The second error calculation unit 404 performs processing using the data stored in the second model storage unit 402 and stores the processing result in the second error data storage unit 405. In addition, the second error calculation unit 404 performs processing using data stored in the third model storage unit 403 and stores the processing result in the third error data storage unit 406. The first evaluation value calculation unit 407 performs processing using the data stored in the second error data storage unit 405 and stores the processing result in the first evaluation value storage unit 408. The first evaluation value calculation unit 407 performs processing using the data stored in the third error data storage unit 406 and stores the processing result in the second evaluation value storage unit 409. The extraction unit 410 performs processing using the data stored in the first evaluation value storage unit 408 and stores the processing result in the first extraction result storage unit 411. Further, the extraction unit 410 performs processing using the data stored in the second evaluation value storage unit 409 and stores the processing result in the second extraction result storage unit 412. The first specifying unit 413 performs processing using the data stored in the first extraction result storage unit 411 and the data stored in the second extraction result storage unit 412, and stores the processing result in the candidate storage unit 6. To do.

図４に、決定部７の機能ブロック図を示す。決定部７は、第３モデル生成部７１と、第４モデル格納部７２と、第３誤差算出部７３と、第４誤差データ格納部７４と、第２評価値算出部７５と、第３評価値格納部７６と、第２特定部７７とを含む。 FIG. 4 shows a functional block diagram of the determination unit 7. The determination unit 7 includes a third model generation unit 71, a fourth model storage unit 72, a third error calculation unit 73, a fourth error data storage unit 74, a second evaluation value calculation unit 75, and a third evaluation value. A value storage unit 76 and a second specifying unit 77 are included.

第３モデル生成部７１は、学習データ格納部５に格納されているデータ及び候補格納部６に格納されているデータを用いて処理を行い、処理結果を第４モデル格納部７２に格納する。第３誤差算出部７３は、学習データ格納部５に格納されているデータ及び第４モデル格納部７２に格納されているデータを用いて処理を行い、処理結果を第４誤差データ格納部７４に格納する。第２評価値算出部７５は、第４誤差データ格納部７４に格納されているデータを用いて処理を行い、処理結果を第３評価値格納部７６に格納する。第２特定部７７は、第３評価値格納部７６に格納されているデータを用いて処理を行い、処理結果を決定結果格納部８に格納する。 The third model generation unit 71 performs processing using the data stored in the learning data storage unit 5 and the data stored in the candidate storage unit 6 and stores the processing result in the fourth model storage unit 72. The third error calculation unit 73 performs processing using the data stored in the learning data storage unit 5 and the data stored in the fourth model storage unit 72, and the processing result is stored in the fourth error data storage unit 74. Store. The second evaluation value calculation unit 75 performs processing using the data stored in the fourth error data storage unit 74 and stores the processing result in the third evaluation value storage unit 76. The second specifying unit 77 performs processing using the data stored in the third evaluation value storage unit 76 and stores the processing result in the determination result storage unit 8.

図５に、学習データ格納部５に格納されている学習データの一例を示す。図５の例では、７月２７日から７月３１日までの各日付について、今日の株価と、１日前の株価と、２日前の株価とが格納されている。図５の例は特定の銘柄についての株価のデータセットを示しており、学習データ格納部５には、多数の銘柄についての株価のデータセットが格納されるようになっている。 FIG. 5 shows an example of learning data stored in the learning data storage unit 5. In the example of FIG. 5, for each date from July 27 to July 31, today's stock price, stock price one day ago, and stock price two days ago are stored. The example of FIG. 5 shows a stock price data set for a specific brand, and the learning data storage unit 5 stores a stock price data set for many brands.

なお、本実施の形態においては、特定の銘柄についての「今日」の株価をその銘柄の「１日前」の株価及び「２日前」の株価を用いて予測モデルを構築した場合において、予測モデルの精度を向上させるために追加する説明変数を選定することを想定している。 In the present embodiment, when a prediction model is constructed using the stock price of “today” for a specific stock using the stock price of “1 day ago” and the stock price of “2 days ago” of the stock, It is assumed that explanatory variables to be added are selected in order to improve accuracy.

次に、図６乃至図３３を用いて、図１に示した情報処理装置１の動作について説明する。まず、グルーピング処理部２は、グルーピング処理を実施する（図６：ステップＳ１）。グルーピング処理については、図７乃至図１２を用いて説明する。 Next, the operation of the information processing apparatus 1 shown in FIG. 1 will be described with reference to FIGS. First, the grouping processing unit 2 performs a grouping process (FIG. 6: Step S1). The grouping process will be described with reference to FIGS.

まず、グルーピング処理の概要について説明する。グルーピング処理においては、処理対象となる複数の銘柄をグループ分けする。グループ分けの基準となるのは、予測モデルによる予測値と実際の値との誤差である。具体的には、図７に示すように、各日付について予測値と実際の値との誤差を算出し、各日付についての誤差を成分とする誤差ベクトルに基づきグループ分けを行う。 First, an outline of the grouping process will be described. In the grouping process, a plurality of brands to be processed are grouped. The standard for grouping is the error between the predicted value by the prediction model and the actual value. Specifically, as shown in FIG. 7, the error between the predicted value and the actual value is calculated for each date, and grouping is performed based on an error vector having the error for each date as a component.

グループ分けは、誤差ベクトルが類似する（すなわち、誤差の変動の傾向が類似している）銘柄が同じグループになるように行う。そして、図８に示すように、同じグループに属する銘柄の誤差ベクトルを用いて代表誤差ベクトルを算出する処理をグループ毎に行う。 The grouping is performed so that brands having similar error vectors (that is, having similar error fluctuation trends) are in the same group. And as shown in FIG. 8, the process which calculates a representative error vector using the error vector of the brand which belongs to the same group is performed for every group.

次に、グルーピング処理の処理フローについて説明する。グルーピング処理部２における第１モデル生成部２１は、学習データ格納部５に学習データが格納されている銘柄のうち未処理の銘柄を１つ特定する（図９：ステップＳ１１）。 Next, the processing flow of the grouping process will be described. The first model generation unit 21 in the grouping processing unit 2 identifies one unprocessed brand among the brands whose learning data is stored in the learning data storage unit 5 (FIG. 9: Step S11).

第１モデル生成部２１は、ステップＳ１１において特定された銘柄の学習データを用いて予測モデルを構築し、構築された予測モデルのデータと銘柄名とを第１モデル格納部２２に格納する（ステップＳ１３）。ステップＳ１３においては、ステップＳ１１において特定された銘柄の「今日」の株価を「１日前」の株価及び「２日前」の株価を用いて予測するための予測モデル（例えばＡＲ（AutoRegressive）モデル）を構築する。予測モデルを構築する技術は本実施の形態の主要な部分ではないので、詳細な説明を省略する。 The first model generation unit 21 constructs a prediction model using the learning data of the brand specified in step S11, and stores the constructed prediction model data and brand name in the first model storage unit 22 (step). S13). In step S13, a prediction model (for example, an AR (AutoRegressive) model) for predicting the stock price “today” of the stock specified in step S11 using the stock price “1 day ago” and the stock price “2 days ago” is used. To construct. Since the technique for constructing the prediction model is not the main part of the present embodiment, detailed description thereof is omitted.

図１０に、第１モデル格納部２２に格納されているデータの一例を示す。図１０の例では、銘柄名と、予測モデルのデータとが格納されている。 FIG. 10 shows an example of data stored in the first model storage unit 22. In the example of FIG. 10, the brand name and the prediction model data are stored.

そして、第１誤差算出部２３は、ステップＳ１３において構築された予測モデルと学習データ格納部５に格納されているデータとを用いて、予測モデルにより算出された値（すなわち予測値）と実際の値との誤差に基づく誤差ベクトルを算出する（ステップＳ１５）。また、第１誤差算出部２３は、銘柄名及び算出結果等を第１誤差データ格納部２４に格納する。 The first error calculation unit 23 uses the prediction model constructed in step S13 and the data stored in the learning data storage unit 5 to calculate the value calculated by the prediction model (that is, the prediction value) and the actual value. An error vector based on the error from the value is calculated (step S15). Further, the first error calculation unit 23 stores the brand name, the calculation result, and the like in the first error data storage unit 24.

図１１に、第１誤差データ格納部２４に格納されているデータの一例を示す。図１１の例では、銘柄名と、各日付について株価の実際の値、予測値及び誤差とが格納されている。 FIG. 11 shows an example of data stored in the first error data storage unit 24. In the example of FIG. 11, the brand name and the actual value, predicted value, and error of the stock price for each date are stored.

図９の説明に戻り、第１モデル生成部２１は、未処理の銘柄が有るか判断する（ステップＳ１７）。未処理の銘柄が有る場合（ステップＳ１７：Ｙｅｓルート）、ステップＳ１１の処理に戻る。 Returning to the description of FIG. 9, the first model generation unit 21 determines whether there is an unprocessed brand (step S17). When there is an unprocessed brand (step S17: Yes route), the process returns to step S11.

一方、未処理の銘柄が無い場合（ステップＳ１７：Ｎｏルート）、グループ生成部２５は、第１誤差データ格納部２４に格納されている誤差ベクトルのデータを用いて銘柄をグループ分けし、グループ分けの結果をメインメモリ等の記憶装置に一旦格納する（ステップＳ１９）。ステップＳ１９においては、例えばＫ平均法を用いてグループ分けを行う。なお、例えばクラスタリング等のグループ分けの技術はよく知られているので、ここでは説明を省略する。 On the other hand, when there is no unprocessed brand (step S17: No route), the group generation unit 25 groups the brands using the error vector data stored in the first error data storage unit 24, and groups them. Is temporarily stored in a storage device such as a main memory (step S19). In step S19, grouping is performed using, for example, the K-average method. Note that, for example, a grouping technique such as clustering is well known, and a description thereof will be omitted here.

グループ生成部２５は、各グループについて代表誤差ベクトルを算出し、各グループに属する銘柄の名前及び代表誤差ベクトルの各成分の値をグルーピング結果格納部３に格納する（ステップＳ２１）。ステップＳ２１においては、例えば、グループに属する銘柄の誤差ベクトルの平均を求めることにより代表誤差ベクトルを算出する。そして元の処理に戻る。 The group generation unit 25 calculates a representative error vector for each group, and stores the names of brands belonging to each group and the values of the components of the representative error vector in the grouping result storage unit 3 (step S21). In step S21, for example, a representative error vector is calculated by obtaining an average of error vectors of brands belonging to the group. Then, the process returns to the original process.

図１２に、グルーピング結果格納部３に格納されているデータの一例を示す。図１２の例では、グループの識別子と、グループに属する銘柄の名前と、各日付についての代表誤差の値とが格納されている。なお、グループの識別子は、各グループに割り当てられた固有の番号である。 FIG. 12 shows an example of data stored in the grouping result storage unit 3. In the example of FIG. 12, a group identifier, a name of a brand belonging to the group, and a representative error value for each date are stored. The group identifier is a unique number assigned to each group.

以上のようにして、誤差の変動の傾向が類似している銘柄が同じグループになるようにグループ分けを行う。なお、このようにグループ分けを行うのは、ある銘柄の予測モデルに追加することが有効である説明変数は、同じグループに属する他の銘柄の予測モデルに流用できるという考え方に基づいている。 As described above, grouping is performed so that stocks having similar error fluctuation trends are in the same group. This grouping is based on the idea that explanatory variables that are effective to be added to a prediction model for a certain brand can be used for prediction models for other brands belonging to the same group.

図６の説明に戻り、候補抽出部４は、第１候補抽出処理を実施する（ステップＳ３）。第１候補抽出処理については、図１３乃至図１８を用いて説明する。 Returning to the description of FIG. 6, the candidate extraction unit 4 performs a first candidate extraction process (step S3). The first candidate extraction process will be described with reference to FIGS.

まず、図１３を用いて、第１候補抽出処理の概要について説明する。なお、説明を簡単にするため、代表誤差ベクトルを１次元のベクトルとしている。第１候補抽出処理においては、各グループについて、代表誤差の予測の精度に基づきＮ（Ｎは２以上の自然数）個の説明変数の候補を抽出する。すなわち、代表誤差を予測するための予測モデルによる予測値と代表誤差との差の二乗の総和を求めることにより評価値を算出し、算出した評価値が最も小さい候補から順にＮ個の候補を抽出する。 First, the outline of the first candidate extraction process will be described with reference to FIG. In order to simplify the explanation, the representative error vector is a one-dimensional vector. In the first candidate extraction process, N (N is a natural number of 2 or more) explanatory variable candidates are extracted for each group based on the accuracy of representative error prediction. That is, an evaluation value is calculated by calculating a sum of squares of differences between a prediction value based on a prediction model for predicting a representative error and a representative error, and N candidates are extracted in order from the candidate having the smallest calculated evaluation value. To do.

但し、代表誤差を予測するための予測モデルには、追加する説明変数の候補だけでなく、グループ内の銘柄についてステップＳ１３において生成された予測モデルに含まれる説明変数を利用する。図１３の例では、Ａ運輸とＢ航空が含まれるグループについて、Ａ運輸及びＢ航空という説明変数の学習データと、追加する説明変数の候補（ガソリン又は米）の学習データとを用いて予測モデルを構築している。このようにするのは、予測モデルに複数の説明変数が含まれていると、説明変数の組合せによっては相乗効果により予測モデルの精度が大きく向上することがあることを考慮しているからである。このようにすることで、例えばＡ運輸又はＢ航空という説明変数と一緒に利用されると予測モデルの精度を大きく向上させることができる候補を取りこぼしにくくなる。なお、図１３の例では、ガソリンについて算出した評価値は米について算出した評価値よりも小さくなるので、Ａ運輸及びＢ航空が属するグループに対しては、米よりもガソリンの方が追加する説明変数として好ましいということになる。 However, in the prediction model for predicting the representative error, not only the explanatory variable candidates to be added, but also the explanatory variables included in the prediction model generated in step S13 for the brands in the group are used. In the example of FIG. 13, for a group including A transportation and B air, a prediction model using learning data of explanatory variables A transportation and B air and learning data of candidate explanatory variables (gasoline or rice) to be added. Is building. This is because when the prediction model includes a plurality of explanatory variables, it is considered that the accuracy of the prediction model may be greatly improved due to a synergistic effect depending on the combination of the explanatory variables. . This makes it difficult to miss candidates that can greatly improve the accuracy of the prediction model when used together with explanatory variables such as A transportation or B air. In addition, in the example of FIG. 13, since the evaluation value calculated about gasoline becomes smaller than the evaluation value calculated about rice, the description which gasoline adds to the group to which A transportation and B air belong belongs. It is preferable as a variable.

次に、第１候補抽出処理の処理フローについて説明する。まず、候補抽出部４における第２モデル生成部４０１は、グルーピング結果格納部３に登録されているグループのうち未処理のグループを１つ特定する（図１４：ステップＳ３１）。 Next, the process flow of the first candidate extraction process will be described. First, the second model generation unit 401 in the candidate extraction unit 4 identifies one unprocessed group among the groups registered in the grouping result storage unit 3 (FIG. 14: step S31).

第２モデル生成部４０１は、ステップＳ３１において特定されたグループに属する銘柄について、ステップＳ１３において構築された予測モデルで用いられている説明変数を抽出する（ステップＳ３３）。例えば図１２におけるグループ１について処理する場合には、Ａ運輸及びＢ航空という説明変数が抽出される。 The 2nd model production | generation part 401 extracts the explanatory variable used with the prediction model constructed | assembled in step S13 about the brand which belongs to the group specified in step S31 (step S33). For example, when processing is performed for group 1 in FIG. 12, explanatory variables A transportation and B air are extracted.

第２モデル生成部４０１は、追加する説明変数の候補のうち未処理の候補を１つ特定する（ステップＳ３５）。また、第２モデル生成部４０１は、抽出された説明変数の学習データと特定された候補の学習データとを用いて、ステップＳ３１において特定されたグループについて算出された代表誤差の値を予測するための予測モデルを構築する（ステップＳ３７）。そして、第２モデル生成部４０１は、グループの識別子、追加する説明変数の候補及び構築された予測モデルのデータを第２モデル格納部４０２に格納する。ステップＳ３７においては、グルーピング結果格納部２に格納されている代表誤差の値を用いて処理を行う。 The second model generation unit 401 identifies one unprocessed candidate among the explanatory variable candidates to be added (step S35). In addition, the second model generation unit 401 uses the extracted learning data of the explanatory variables and the identified candidate learning data to predict the value of the representative error calculated for the group identified in step S31. Is constructed (step S37). Then, the second model generation unit 401 stores the group identifier, the explanatory variable candidate to be added, and the data of the constructed prediction model in the second model storage unit 402. In step S37, processing is performed using the value of the representative error stored in the grouping result storage unit 2.

図１５に、第２モデル格納部４０２に格納されているデータの一例を示す。図１５の例では、グループの識別子と、追加する説明変数の候補と、予測モデルのデータとが格納されている。 FIG. 15 shows an example of data stored in the second model storage unit 402. In the example of FIG. 15, group identifiers, candidates for explanatory variables to be added, and prediction model data are stored.

そして、第２誤差算出部４０４は、ステップＳ３７において構築された予測モデルにより算出された値と代表誤差の値との誤差に基づく誤差ベクトルを算出し、グループの識別子、追加する説明変数の候補及び誤差ベクトルの各成分の値等を第２誤差データ格納部４０５に格納する（ステップＳ３９）。 Then, the second error calculation unit 404 calculates an error vector based on the error between the value calculated by the prediction model constructed in step S37 and the value of the representative error, the group identifier, the candidate explanatory variable to be added, and The value of each component of the error vector is stored in the second error data storage unit 405 (step S39).

図１６に、第２誤差データ格納部４０５に格納されているデータの一例を示す。図１６の例では、グループの識別子と、追加する説明変数の候補と、各日付について代表誤差、予測値及び誤差とが格納されている。 FIG. 16 shows an example of data stored in the second error data storage unit 405. In the example of FIG. 16, a group identifier, a candidate for an explanatory variable to be added, and a representative error, a predicted value, and an error for each date are stored.

そして、第１評価値算出部４０７は、ステップＳ３９において算出された誤差ベクトルの各成分を二乗してその総和を求めることにより、候補の有用さを表す評価値を算出する（ステップＳ４１）。そして、グループの識別子、追加する説明変数の候補及び評価値を第１評価値格納部４０８に格納する。 Then, the first evaluation value calculation unit 407 calculates an evaluation value representing the usefulness of the candidate by squaring each component of the error vector calculated in step S39 to obtain the sum (step S41). Then, the group identifier, the explanatory variable candidate to be added, and the evaluation value are stored in the first evaluation value storage unit 408.

図１７に、第１評価値格納部４０８に格納されているデータの一例を示す。図１７の例では、グループの識別子と、追加する説明変数の候補と、評価値とが格納されている。 FIG. 17 shows an example of data stored in the first evaluation value storage unit 408. In the example of FIG. 17, group identifiers, candidate explanatory variables to be added, and evaluation values are stored.

そして、第２モデル生成部４０１は、未処理の候補が有るか判断する（ステップＳ４３）。未処理の候補が有る場合には（ステップＳ４３：Ｙｅｓルート）、次の候補について処理するため、ステップＳ３５の処理に戻る。 Then, the second model generation unit 401 determines whether there is an unprocessed candidate (step S43). If there is an unprocessed candidate (step S43: Yes route), the process returns to step S35 to process the next candidate.

一方、未処理の候補が無い場合には（ステップＳ４３：Ｎｏルート）、抽出部４１０は、ステップＳ３１において特定されたグループについて、評価値が小さい順にＮ個の説明変数の候補を第１評価値格納部４０８から抽出する（ステップＳ４５）。また、抽出部４１０は、グループの識別子及び抽出した説明変数の候補を第１抽出結果格納部４１１に格納する。 On the other hand, if there is no unprocessed candidate (step S43: No route), the extraction unit 410 sets N explanatory variable candidates as the first evaluation value for the group specified in step S31 in ascending order of evaluation value. Extracted from the storage unit 408 (step S45). Further, the extraction unit 410 stores the group identifier and the extracted explanatory variable candidates in the first extraction result storage unit 411.

図１８に、第１抽出結果格納部４１１に格納されているデータの一例を示す。図１８の例では、グループの識別子と、追加する説明変数の候補とが格納されている。 FIG. 18 shows an example of data stored in the first extraction result storage unit 411. In the example of FIG. 18, group identifiers and candidate explanatory variables to be added are stored.

そして、第２モデル生成部４０１は、未処理のグループが有るか判断する（ステップＳ４７）。未処理のグループが有る場合（ステップＳ４７：Ｙｅｓルート）、次のグループについて処理するため、ステップＳ３１の処理に戻り、未処理のグループが無い場合（ステップＳ４７：Ｎｏルート）、元の処理に戻る。 Then, the second model generation unit 401 determines whether there is an unprocessed group (step S47). If there is an unprocessed group (step S47: Yes route), the process returns to step S31 to process the next group. If there is no unprocessed group (step S47: No route), the process returns to the original process. .

以上のような処理を実施することで、追加先の予測モデルにおいて用いられている説明変数と一緒に用いられると特に精度を向上させることができる説明変数を特定することができるようになる。 By carrying out the processing as described above, it is possible to specify an explanatory variable that can improve the accuracy particularly when used together with the explanatory variable used in the prediction model of the addition destination.

図６の説明に戻り、候補抽出部４は、第２候補抽出処理を実施する（ステップＳ５）。第２候補抽出処理については、図１９乃至図２４を用いて説明する。 Returning to the description of FIG. 6, the candidate extraction unit 4 performs the second candidate extraction process (step S5). The second candidate extraction process will be described with reference to FIGS.

まず、図１９を用いて、第２候補抽出処理の概要について説明する。なお、説明を簡単にするため、代表誤差ベクトルを１次元のベクトルとしている。第２候補抽出処理においても、第１候補抽出処理と同様、各グループについて代表誤差の予測の精度に基づきＮ（Ｎは２以上の自然数）個の説明変数の候補を抽出する。すなわち、代表誤差を予測するための予測モデルによる予測値と代表誤差との差の二乗の総和を求めることにより評価値を算出し、算出した評価値が最も小さい候補から順にＮ個の候補を抽出する。 First, the outline of the second candidate extraction process will be described with reference to FIG. In order to simplify the explanation, the representative error vector is a one-dimensional vector. In the second candidate extraction process, as in the first candidate extraction process, N (N is a natural number of 2 or more) explanatory variable candidates are extracted for each group based on the accuracy of representative error prediction. That is, an evaluation value is calculated by calculating a sum of squares of differences between a prediction value based on a prediction model for predicting a representative error and a representative error, and N candidates are extracted in order from the candidate having the smallest calculated evaluation value. To do.

但し、第２候補抽出処理においては、代表誤差を予測するための予測モデルには、追加する説明変数の候補だけを利用する。図１９の例では、Ａ運輸とＢ航空が含まれるグループについて、追加する説明変数の候補（ガソリン又は米）の学習データだけを用いて予測モデルを構築している。このようにするのは、複数の説明変数による相乗効果が無いと仮定した場合において予測モデルの精度の向上に有用な候補を特定するためである。図１９の例では、ガソリンについて算出した評価値は米について算出した評価値よりも小さくなるので、Ａ運輸及びＢ航空が属するグループに対しては、米よりもガソリンの方が追加する説明変数の候補として好ましいということになる。なお、普通は説明変数の数を増やした方が予測モデルの精度が良くなるので、第１候補抽出処理において算出した評価値よりも大きな値が算出されることになる。 However, in the second candidate extraction process, only the explanatory variable candidates to be added are used in the prediction model for predicting the representative error. In the example of FIG. 19, for a group including A transportation and B air, a prediction model is constructed using only learning data of candidate explanatory variables (gasoline or rice) to be added. This is to identify candidates that are useful for improving the accuracy of the prediction model when it is assumed that there is no synergistic effect due to a plurality of explanatory variables. In the example of FIG. 19, the evaluation value calculated for gasoline is smaller than the evaluation value calculated for rice, so for the group to which A Transport and B Airlines belong, the explanatory variable added by gasoline rather than rice This is preferable as a candidate. Normally, the accuracy of the prediction model is improved by increasing the number of explanatory variables, and therefore a value larger than the evaluation value calculated in the first candidate extraction process is calculated.

次に、第２候補抽出処理の処理フローについて説明する。まず、候補抽出部４における第２モデル生成部４０１は、グルーピング結果格納部３に登録されているグループのうち未処理のグループを１つ特定する（図２０：ステップＳ５１）。 Next, the process flow of the second candidate extraction process will be described. First, the second model generation unit 401 in the candidate extraction unit 4 identifies one unprocessed group among the groups registered in the grouping result storage unit 3 (FIG. 20: Step S51).

第２モデル生成部４０１は、追加する説明変数の候補のうち未処理の候補を１つ特定する（ステップＳ５３）。また、第２モデル生成部４０１は、特定された候補の学習データを用いて、ステップＳ５１において特定されたグループについて算出された代表誤差の値を予測するための予測モデルを構築する（ステップＳ５５）。そして、第２モデル生成部４０１は、グループの識別子、追加する説明変数の候補及び構築された予測モデルのデータを第３モデル格納部４０３に格納する。ステップＳ５５においては、グルーピング結果格納部３に格納されている代表誤差の値を用いて処理を行う。 The second model generation unit 401 identifies one unprocessed candidate among the explanatory variable candidates to be added (step S53). Further, the second model generation unit 401 uses the identified candidate learning data to construct a prediction model for predicting the representative error value calculated for the group identified in step S51 (step S55). . Then, the second model generation unit 401 stores the group identifier, the candidate of the explanatory variable to be added, and the data of the constructed prediction model in the third model storage unit 403. In step S55, processing is performed using the value of the representative error stored in the grouping result storage unit 3.

図２１に、第３モデル格納部４０３に格納されているデータの一例を示す。図２１の例では、グループの識別子と、追加する説明変数の候補と、予測モデルのデータとが格納されている。 FIG. 21 shows an example of data stored in the third model storage unit 403. In the example of FIG. 21, group identifiers, candidates for explanatory variables to be added, and prediction model data are stored.

そして、第２誤差算出部４０４は、ステップ５５において構築された予測モデルにより算出された値と代表誤差の値との誤差に基づく誤差ベクトルを算出する（ステップＳ５７）。そして、第２誤差算出部４０４は、グループの識別子、追加する説明変数の候補及び誤差ベクトルの各成分の値等を第３誤差データ格納部４０６に格納する。 Then, the second error calculation unit 404 calculates an error vector based on the error between the value calculated by the prediction model constructed in step 55 and the value of the representative error (step S57). Then, the second error calculation unit 404 stores the group identifier, the candidate explanatory variable to be added, the value of each component of the error vector, and the like in the third error data storage unit 406.

図２２に、第３誤差データ格納部４０６に格納されているデータの一例を示す。図２２の例では、グループの識別子と、追加する説明変数の候補と、各日付について代表誤差、予測値及び誤差とが格納されている。 FIG. 22 shows an example of data stored in the third error data storage unit 406. In the example of FIG. 22, a group identifier, a candidate for an explanatory variable to be added, and a representative error, a predicted value, and an error for each date are stored.

そして、第１評価値算出部４０７は、ステップＳ５７において算出された誤差ベクトルの各成分を二乗してその総和を求めることにより、候補の有用さを表す評価値を算出する（ステップＳ５９）。そして、グループの識別子、追加する説明変数の候補及び評価値を第２評価値格納部４０９に格納する。 Then, the first evaluation value calculation unit 407 calculates an evaluation value representing the usefulness of the candidate by squaring each component of the error vector calculated in step S57 and obtaining the sum (step S59). The group identifier, the explanatory variable candidate to be added, and the evaluation value are stored in the second evaluation value storage unit 409.

図２３に、第２評価値格納部４０９に格納されているデータの一例を示す。図２３の例では、グループの識別子と、追加する説明変数の候補と、評価値とが格納されている。 FIG. 23 shows an example of data stored in the second evaluation value storage unit 409. In the example of FIG. 23, an identifier of a group, a candidate for an explanatory variable to be added, and an evaluation value are stored.

そして、第２モデル生成部４０１は、未処理の候補が有るか判断する（ステップＳ６１）。未処理の候補が有る場合には（ステップＳ６１：Ｙｅｓルート）、次の候補について処理するため、ステップＳ５３の処理に戻る。 Then, the second model generation unit 401 determines whether there is an unprocessed candidate (step S61). If there is an unprocessed candidate (step S61: Yes route), the process returns to step S53 to process the next candidate.

一方、未処理の候補が無い場合には（ステップＳ６１：Ｎｏルート）、抽出部４１０は、ステップＳ５１において特定されたグループについて、評価値が小さい順にＮ個の説明変数の候補を第２評価値格納部４０９から抽出する（ステップＳ６３）。また、抽出部４１０は、グループの識別子及び抽出した説明変数の候補を第２抽出結果格納部４１２に格納する。 On the other hand, when there is no unprocessed candidate (step S61: No route), the extraction unit 410 selects the second explanatory value for N explanatory variable candidates in ascending order of evaluation value for the group specified in step S51. Extracted from the storage unit 409 (step S63). Further, the extraction unit 410 stores the group identifier and the extracted explanatory variable candidates in the second extraction result storage unit 412.

図２４に、第２抽出結果格納部４１２に格納されているデータの一例を示す。図２４の例では、グループの識別子と、追加する説明変数の候補とが格納されている。 FIG. 24 shows an example of data stored in the second extraction result storage unit 412. In the example of FIG. 24, a group identifier and a candidate for an explanatory variable to be added are stored.

そして、第２モデル生成部４０１は、未処理のグループが有るか判断する（ステップＳ６５）。未処理のグループが有る場合（ステップＳ６５：Ｙｅｓルート）、次のグループについて処理するため、ステップＳ５１の処理に戻り、未処理のグループが無い場合（ステップＳ６５：Ｎｏルート）、元の処理に戻る。 Then, the second model generation unit 401 determines whether there is an unprocessed group (step S65). If there is an unprocessed group (step S65: Yes route), the process returns to step S51 to process the next group. If there is no unprocessed group (step S65: No route), the process returns to the original process. .

以上のような処理を実施することで、複数の説明変数による相乗効果が無いと仮定した場合において予測モデルの精度の向上に有用な候補を特定することができるようになる。 By performing the processing as described above, it is possible to identify candidates that are useful for improving the accuracy of the prediction model when it is assumed that there is no synergistic effect due to a plurality of explanatory variables.

図６の説明に戻り、候補抽出部４における第１特定部４１３は、第１候補抽出処理及び第２候補抽出処理の結果に基づき最終候補をグループ毎に決定し、グループの識別子及び最終候補を候補格納部６に格納する（ステップＳ７）。具体的には、第１抽出結果格納部４１１に格納されており且つ第２抽出結果格納部４１２に格納されている説明変数の候補を最終候補に決定する。 Returning to the description of FIG. 6, the first specifying unit 413 in the candidate extracting unit 4 determines a final candidate for each group based on the results of the first candidate extracting process and the second candidate extracting process, and determines the group identifier and the final candidate. Store in the candidate storage unit 6 (step S7). Specifically, the candidates for explanatory variables stored in the first extraction result storage unit 411 and stored in the second extraction result storage unit 412 are determined as final candidates.

図２５に、候補格納部６に格納されているデータの一例を示す。図２５の例では、グループの識別子と、追加する説明変数の候補とが格納されている。 FIG. 25 shows an example of data stored in the candidate storage unit 6. In the example of FIG. 25, a group identifier and a candidate for an explanatory variable to be added are stored.

そして、決定部７は、決定処理を実施する（ステップＳ９）。決定処理については、図２６乃至図３１を用いて説明する。 And the determination part 7 implements a determination process (step S9). The determination process will be described with reference to FIGS.

まず、図２６を用いて、決定処理の概要について説明する。決定処理においては、各銘柄について、その銘柄が属するグループについての最終候補の各々を実際に予測モデルに追加して評価値を算出し、評価値が表す有用さが高い説明変数を予測モデルに追加する変数に決定する。例えば、グループ１についてはガソリン及び軽油が最終候補となっている場合には、ガソリン及び軽油の各々を実際に予測モデルに追加して評価値を算出し、評価値が最も小さいものを予測モデルに追加する説明変数に決定する。図２６の例では、ガソリンについて算出した評価値は軽油について算出した評価値よりも小さいので、Ａ運輸の予測モデルに追加する説明変数はガソリンに決定する。 First, the outline of the determination process will be described with reference to FIG. In the decision process, for each stock, each final candidate for the group to which the stock belongs is actually added to the prediction model to calculate an evaluation value, and an explanatory variable that is highly useful and represented by the evaluation value is added to the prediction model. Decide which variables to use. For example, when gasoline and diesel oil are the final candidates for Group 1, each of gasoline and diesel oil is actually added to the prediction model to calculate the evaluation value, and the one with the smallest evaluation value is used as the prediction model. Determine the explanatory variable to be added. In the example of FIG. 26, since the evaluation value calculated for gasoline is smaller than the evaluation value calculated for light oil, the explanatory variable added to the prediction model of A transportation is determined to be gasoline.

次に、決定処理の処理フローについて説明する。まず、決定部７における第３モデル生成部７１は、学習データ格納部５に学習データが格納されている銘柄のうち未処理の銘柄を１つ特定する（図２７：ステップＳ７１）。また、第３モデル生成部７１は、ステップＳ７１において特定された銘柄が属するグループについての最終候補の中から未処理の最終候補を１つ特定する（ステップＳ７３）。ステップＳ７３においては、まずステップＳ７１において特定された銘柄が属するグループをグルーピング結果格納部３から特定し、また特定されたグループに対応する最終候補を候補格納部６から特定し、特定された最終候補の中から未処理の最終候補を特定する。 Next, the processing flow of the determination process will be described. First, the 3rd model production | generation part 71 in the determination part 7 specifies one unprocessed brand | brand among the brands in which learning data are stored in the learning data storage part 5 (FIG. 27: step S71). In addition, the third model generation unit 71 specifies one unprocessed final candidate from the final candidates for the group to which the brand specified in step S71 belongs (step S73). In step S73, first, the group to which the brand specified in step S71 belongs is specified from the grouping result storage unit 3, and the final candidate corresponding to the specified group is specified from the candidate storage unit 6, and the specified final candidate An unprocessed final candidate is identified from the list.

そして、第３モデル生成部７１は、ステップＳ７１において特定された銘柄の学習データとステップＳ７３において特定された最終候補の学習データとを用いて、特定された銘柄の「今日」の株価を予測するための予測モデルを構築する（ステップＳ７５）。そして、第３モデル生成部７１は、銘柄名、追加する説明変数の候補及び構築された予測モデルのデータを第４モデル格納部７２に格納する。 Then, the third model generation unit 71 predicts the stock price of “today” of the specified brand using the learning data of the brand specified in step S71 and the learning data of the final candidate specified in step S73. A prediction model is constructed (step S75). Then, the third model generation unit 71 stores the brand name, the candidate explanatory variable to be added, and the data of the constructed prediction model in the fourth model storage unit 72.

図２８に、第４モデル格納部７２に格納されているデータの一例を示す。図２８の例では、銘柄名と、追加する説明変数の候補と、予測モデルのデータとが格納されている。 FIG. 28 shows an example of data stored in the fourth model storage unit 72. In the example of FIG. 28, brand names, candidates for explanatory variables to be added, and prediction model data are stored.

そして、第３誤差算出部７３は、特定された銘柄の株価の実際の値とステップＳ７５において構築された予測モデルにより算出された値との誤差に基づく誤差ベクトルを算出し、銘柄名、追加する説明変数の候補及び誤差ベクトルの各成分の値等を第４誤差データ格納部７４に格納する（ステップＳ７７）。 Then, the third error calculation unit 73 calculates an error vector based on an error between the actual value of the stock price of the specified brand and the value calculated by the prediction model constructed in step S75, and adds the brand name. The candidate of explanatory variable, the value of each component of the error vector, and the like are stored in the fourth error data storage unit 74 (step S77).

図２９に、第４誤差データ格納部７４に格納されているデータの一例を示す。図２９の例では、銘柄名と、追加する説明変数の候補と、各日付について株価の実際の値、予測値及び誤差とが格納されている。 FIG. 29 shows an example of data stored in the fourth error data storage unit 74. In the example of FIG. 29, brand names, candidates for explanatory variables to be added, and actual values, predicted values, and errors of stock prices for each date are stored.

そして、第２評価値算出部７５は、ステップＳ７７において算出された誤差ベクトルの各成分を二乗してその総和を求めることにより、候補の有用さを表す評価値を算出し、銘柄名、追加する説明変数の候補及び評価値を第３評価値格納部７６に格納する（ステップＳ７９）。 Then, the second evaluation value calculation unit 75 calculates the evaluation value representing the usefulness of the candidate by squaring each component of the error vector calculated in step S77 to obtain the sum, and adds the brand name. The candidate explanatory variables and the evaluation value are stored in the third evaluation value storage unit 76 (step S79).

図３０に、第３評価値格納部７６に格納されているデータの一例を示す。図３０の例では、銘柄名と、追加する説明変数の候補と、評価値とが格納されている。 FIG. 30 shows an example of data stored in the third evaluation value storage unit 76. In the example of FIG. 30, a brand name, a candidate for an explanatory variable to be added, and an evaluation value are stored.

そして、第３モデル生成部７１は、候補格納部６に未処理の候補が有るか判断する（ステップＳ８１）。未処理の候補が有る場合（ステップＳ８１：Ｙｅｓルート）、次の候補について処理するため、ステップＳ７３の処理に戻る。 Then, the third model generation unit 71 determines whether there is an unprocessed candidate in the candidate storage unit 6 (step S81). If there is an unprocessed candidate (step S81: Yes route), the process returns to step S73 to process the next candidate.

一方、未処理の候補が無い場合（ステップＳ８１：Ｎｏルート）、第２特定部７７は、ステップＳ７１において特定された銘柄の予測モデルに追加する説明変数を第３評価値格納部７６に格納されている評価値に基づき決定し、銘柄名に対応付けて追加する説明変数を決定結果格納部８に格納する（ステップＳ８３）。具体的には、評価値が最も小さい説明変数に決定する。 On the other hand, when there is no unprocessed candidate (step S81: No route), the second specifying unit 77 stores an explanatory variable to be added to the prediction model of the brand specified in step S71 in the third evaluation value storage unit 76. An explanatory variable that is determined on the basis of the evaluation value and is added in association with the brand name is stored in the determination result storage unit 8 (step S83). Specifically, the explanatory variable having the smallest evaluation value is determined.

図３１に、決定結果格納部８に格納されているデータの一例を示す。図３１の例では、銘柄名と、予測モデルに追加する説明変数とが格納されている。 FIG. 31 shows an example of data stored in the determination result storage unit 8. In the example of FIG. 31, brand names and explanatory variables added to the prediction model are stored.

そして、第３モデル生成部７１は、未処理の銘柄が有るか判断する（ステップＳ８５）。未処理の銘柄が有る場合（ステップＳ８５：Ｙｅｓルート）、次の銘柄について処理するため、ステップＳ７１の処理に戻る。一方、未処理の銘柄が無い場合（ステップＳ８５：Ｎｏルート）、元の処理に戻る。 And the 3rd model production | generation part 71 judges whether there exists an unprocessed brand (step S85). If there is an unprocessed brand (step S85: Yes route), the process returns to step S71 to process the next brand. On the other hand, when there is no unprocessed brand (step S85: No route), the process returns to the original process.

以上のような処理を実施することにより、各銘柄について最適な説明変数を特定することができるようになる。 By carrying out the processing as described above, it is possible to specify the optimum explanatory variable for each brand.

図６の説明に戻り、出力部９は、グルーピング結果格納部３に格納されているデータ、候補格納部６に格納されているデータ及び決定結果格納部８に格納されているデータを用いて決定結果を表示するための画面のデータを生成する。そして、生成した画面のデータを表示装置に表示させる（ステップＳ１０）。そして処理を終了する。 Returning to the description of FIG. 6, the output unit 9 determines using the data stored in the grouping result storage unit 3, the data stored in the candidate storage unit 6, and the data stored in the determination result storage unit 8. Generate screen data to display the results. Then, the generated screen data is displayed on the display device (step S10). Then, the process ends.

図３２に、表示される画面の一例を示す。図３２の例では、各グループについて、そのグループに属する銘柄の予測モデルに追加する説明変数と、そのグループに属する銘柄の予測モデルに追加する説明変数の候補と、そのグループについて算出した代表誤差ベクトルとが表示されている。 FIG. 32 shows an example of the displayed screen. In the example of FIG. 32, for each group, explanatory variables to be added to the prediction model of the brand belonging to the group, candidates for explanatory variables to be added to the prediction model of the brand belonging to the group, and the representative error vector calculated for the group And are displayed.

以上のように、予測モデルに追加する説明変数を銘柄毎に決定するのではなく、グループ毎に決定することで、予測モデルに追加する説明変数を決定する際に行う計算の量を減らすことができるようになる。 As described above, instead of determining the explanatory variables to be added to the prediction model for each brand, it is possible to reduce the amount of calculation when determining the explanatory variables to be added to the prediction model by determining for each group. become able to.

ここで、予測モデルに追加する説明変数を銘柄毎に決定する場合の処理について、図３３を用いて簡単に説明する。まず情報処理装置１における処理部（図示せず）が未処理の銘柄を１つ特定する（図３３：ステップＳ１０１）。処理部は、追加する説明変数の候補のうち未処理の候補を１つ特定する（ステップＳ１０３）。処理部は、特定された銘柄の学習データと特定された候補の学習データとを用いて予測モデルを構築する（ステップＳ１０５）。処理部は、構築された予測モデルによる予測値と実際の値とを用いて誤差ベクトルを算出する（ステップＳ１０７）。処理部は、誤差ベクトルの各成分の値を二乗してその総和を求めることにより、候補の有用さを表す評価値を算出する（ステップＳ１０９）。そして、未処理の候補が有る場合（ステップＳ１１１：Ｙｅｓルート）はステップＳ１０３の処理に戻り、未処理の候補が無い場合（ステップＳ１１１：Ｎｏルート）は評価値が表す有用さが最も高い説明変数を特定する（ステップＳ１１３）。そして、未処理の銘柄が有る場合（ステップＳ１１５：Ｙｅｓルート）はステップＳ１０１の処理に戻り、未処理の銘柄が無い場合（ステップＳ１１５：Ｎｏルート）は処理を終了する。 Here, the process in the case where the explanatory variable added to the prediction model is determined for each brand will be briefly described with reference to FIG. First, a processing unit (not shown) in the information processing apparatus 1 identifies one unprocessed brand (FIG. 33: step S101). The processing unit identifies one unprocessed candidate among the explanatory variable candidates to be added (step S103). The processing unit builds a prediction model using the specified brand learning data and the specified candidate learning data (step S105). The processing unit calculates an error vector using the predicted value based on the constructed prediction model and the actual value (step S107). The processing unit calculates an evaluation value representing the usefulness of the candidate by squaring the value of each component of the error vector and obtaining the sum (step S109). If there is an unprocessed candidate (step S111: Yes route), the process returns to step S103. If there is no unprocessed candidate (step S111: No route), the explanatory variable having the highest usefulness represented by the evaluation value is used. Is specified (step S113). If there is an unprocessed brand (step S115: Yes route), the process returns to step S101. If there is no unprocessed brand (step S115: No route), the process is terminated.

背景技術の欄において述べたように、このようにすると、総合の計算量はおおよそ（１の説明変数について評価値を算出するのに要する計算量）×（目的変数の数（すなわち銘柄の数））×（説明変数の候補の数）となる。そのため、目的変数の数及び説明変数の数が多くなると、計算量が非常に多くなるという問題がある。 As described in the Background Art section, in this way, the total amount of calculation is approximately (the amount of calculation required to calculate the evaluation value for one explanatory variable) × (the number of objective variables (that is, the number of issues). ) × (number of candidates for explanatory variables). Therefore, when the number of objective variables and the number of explanatory variables increase, there is a problem that the amount of calculation becomes very large.

一方、上で述べたような本実施の形態の処理によれば、総合の計算量はおおよそ（グルーピング処理に要する計算量）＋（１の説明変数について評価値を算出するのに要する計算量）×（グループの数）×（説明変数の候補の数）＋（１の説明変数について評価値を算出するのに要する計算量）×（グループの数）×（グループに含まれる銘柄（目的変数）の数）×（説明変数の最終候補の数）となる。これにより、計算量を削減することができるようになる。 On the other hand, according to the processing of the present embodiment as described above, the total calculation amount is approximately (calculation amount required for grouping processing) + (calculation amount required to calculate an evaluation value for one explanatory variable). X (number of groups) x (number of explanatory variable candidates) + (calculation amount required to calculate an evaluation value for one explanatory variable) x (number of groups) x (stocks included in group (objective variable)) Number) × (number of final candidates for explanatory variables). As a result, the amount of calculation can be reduced.

以上本技術の一実施の形態を説明したが、本技術はこれに限定されるものではない。例えば、上で説明した情報処理装置１の機能ブロック構成は必ずしも実際のプログラムモジュール構成に対応するものではない。 Although one embodiment of the present technology has been described above, the present technology is not limited to this. For example, the functional block configuration of the information processing apparatus 1 described above does not necessarily correspond to an actual program module configuration.

また、上で説明した各テーブルの構成は一例であって、必ずしも上記のような構成でなければならないわけではない。さらに、処理フローにおいても、処理結果が変わらなければ処理の順番を入れ替えることも可能である。さらに、並列に実行させるようにしても良い。 Further, the configuration of each table described above is an example, and the configuration as described above is not necessarily required. Further, in the processing flow, the processing order can be changed if the processing result does not change. Further, it may be executed in parallel.

なお、時系列データとして株価のデータを使用したが、使用するデータは株価のデータに限られるわけではなく、他の時系列データに対しても本実施の形態を適用することができる。 Although stock price data is used as time series data, the data used is not limited to stock price data, and the present embodiment can be applied to other time series data.

なお、使用するデータは時系列データ以外のデータであってもよい。すなわち、目的変数及び説明変数に時間が対応付けられていなくてもよい。 The data to be used may be data other than time series data. That is, the time does not have to be associated with the objective variable and the explanatory variable.

また、ステップＳ７においては、第１抽出結果格納部４１１に格納されており且つ第２抽出結果格納部４１２に格納されている説明変数の候補を最終候補としたが、他の方法で最終候補を決定してもよい。例えば、第１候補抽出処理において算出された評価値と第２候補抽出処理において算出された評価値との和が小さいものから順に所定個数の説明変数を最終候補としてもよい。 Further, in step S7, the explanatory variable candidates stored in the first extraction result storage unit 411 and stored in the second extraction result storage unit 412 are set as final candidates. You may decide. For example, a predetermined number of explanatory variables may be set as final candidates in order from the smallest sum of the evaluation value calculated in the first candidate extraction process and the evaluation value calculated in the second candidate extraction process.

また、情報処理装置１の処理を複数の台のコンピュータで実行させるようにしてもよい。 Further, the processing of the information processing apparatus 1 may be executed by a plurality of computers.

なお、上で述べた情報処理装置１は、コンピュータ装置であって、図３４に示すように、メモリ２５０１とＣＰＵ（Central Processing Unit）２５０３とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatus 1 described above is a computer apparatus, and as shown in FIG. 34, a memory 2501, a CPU (Central Processing Unit) 2503, a hard disk drive (HDD: Hard Disk Drive) 2505, and a display device. A display control unit 2507 connected to 2509, a drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS: Operating System) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本技術の実施の形態をまとめると、以下のようになる。 The embodiments of the present technology described above are summarized as follows.

本実施の形態に係る情報処理装置は、 The information processing apparatus according to this embodiment is

（Ａ）記憶装置と、（Ｂ）複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、記憶装置に格納する第１算出部と、（Ｃ）記憶装置に格納されている誤差に基づき複数の目的変数を複数のグループに分類し、当該複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、記憶装置に格納する第２算出部と、（Ｄ）複数のグループの各々について、記憶装置に格納されている代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と代表値との差に基づき、当該グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する第１決定部とを有する。 For each of (A) a storage device and (B) a plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated. A first calculation unit that calculates and stores in the storage device; and (C) classifies the plurality of objective variables into a plurality of groups based on the error stored in the storage device, and assigns each of the plurality of groups to the group. A second calculation unit that calculates a representative value of the error using the error calculated for the target variable to which the variable belongs, and stores it in the storage device; and (D) a representative value stored in the storage device for each of the plurality of groups. A plurality of second prediction models for predicting the value are generated while changing the explanatory variables, and belong to the group based on the difference between each of the values calculated by the plurality of generated second prediction models and the representative value Purpose change And a first determining unit for determining the explanatory variable to be added to the first predictive model.

このように、第１の予測モデルに追加する説明変数を目的変数毎に決定するのではなく、グループ毎に決定することで、第１の予測モデルに追加する説明変数を決定する際に行う計算の量を削減することができるようになる。 In this way, the calculation performed when determining the explanatory variable to be added to the first prediction model by determining the explanatory variable to be added to the first prediction model by determining for each group instead of determining the explanatory variable to be added to the first prediction model. The amount of can be reduced.

また、上で述べた第１決定部が、（ｄ１）複数の説明変数の候補の各々について、当該候補とグループに属する目的変数の第１の予測モデルに含まれる説明変数とを用いて第２の予測モデルを生成し、当該第２の予測モデルによって算出された値と代表値との第１の差を算出し、記憶装置に格納する第３算出部と、（ｄ２）第３算出部により算出された第１の差に基づき、複数の説明変数の候補の中から、グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する第２決定部とを有するようにしてもよい。複数の説明変数が同じ予測モデルに含まれると、相乗効果により予測精度が大きく向上することがある。そのため、上で述べたようにすれば、第１の予測モデルに含まれる説明変数と一緒に利用すると予測精度が大きく向上する説明変数を取りこぼしにくくなる。 Further, the first determination unit described above uses (d1) a second for each of the plurality of explanatory variable candidates by using the candidate and the explanatory variable included in the first prediction model of the objective variable belonging to the group. A third calculation unit that generates a first difference between the value calculated by the second prediction model and the representative value and stores the first difference in the storage device; and (d2) the third calculation unit. Based on the calculated first difference, a second determining unit that determines an explanatory variable to be added to the first prediction model of the target variable belonging to the group from among a plurality of explanatory variable candidates. Good. When multiple explanatory variables are included in the same prediction model, the prediction accuracy may be greatly improved due to a synergistic effect. Therefore, as described above, it is difficult to miss an explanatory variable whose prediction accuracy is greatly improved when it is used together with the explanatory variable included in the first prediction model.

また、上で述べた第３算出部が、（ｄ１１）複数の説明変数の候補の各々について、当該候補を用いて第２の予測モデルを生成し、当該第２の予測モデルによって算出された値と代表値との第２の差を算出し、記憶装置に格納するようにしてもよい。そして、（Ｅ）第３算出部により算出された第１の差及び第２の差に基づき、複数の説明変数の候補の中から、グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する第３決定部をさらに有するようにしてもよい。このようにすれば、第１の予測モデルに含まれる説明変数との相乗効果が無いと仮定した場合に予測精度の向上に有効な説明変数を取りこぼしにくくなる。 Further, the third calculation unit described above (d11) generates a second prediction model for each of a plurality of explanatory variable candidates using the candidate, and the value calculated by the second prediction model The second difference between the value and the representative value may be calculated and stored in the storage device. (E) Explanation to be added to the first prediction model of the objective variable belonging to the group from among a plurality of candidate explanatory variables based on the first difference and the second difference calculated by the third calculation unit. You may make it further have the 3rd determination part which determines a variable. In this way, when it is assumed that there is no synergistic effect with the explanatory variable included in the first prediction model, it is difficult to miss the explanatory variable effective for improving the prediction accuracy.

また、上で述べた第１決定部が、（ｄ３）第１の予測モデルに追加する説明変数を複数決定するようにしてもよい。そして、上で述べた本情報処理装置が、（Ｆ）複数の目的変数の各々について、当該目的変数の第１の予測モデルに含まれる説明変数と当該目的変数が属するグループについて第１決定部によって決定された複数の説明変数の各々とを用いて、当該目的変数の値を予測するための第３の予測モデルを複数生成し、当該第３の予測モデルによって算出された値の各々と当該目的変数の実際の値との誤差を算出し、記憶装置に格納する第４算出部と、（Ｇ）複数の目的変数の各々について、第４算出部により算出された誤差に基づき、第１決定部によって決定された複数の説明変数の中から、第１の予測モデルに追加する最も適切な説明変数を決定する第４決定部とをさらに有するようにしてもよい。このようにすれば、各目的変数の予測精度の向上に最も有効な説明変数を特定することができるようになる。 In addition, the first determination unit described above may determine (d3) a plurality of explanatory variables to be added to the first prediction model. Then, the information processing apparatus described above performs (F) for each of the plurality of objective variables, the first determination unit determines the explanatory variable included in the first prediction model of the objective variable and the group to which the objective variable belongs. A plurality of third prediction models for predicting the value of the target variable are generated using each of the determined explanatory variables, and each of the values calculated by the third prediction model and the target A fourth calculation unit that calculates an error from the actual value of the variable and stores the error in the storage device; and (G) a first determination unit based on the error calculated by the fourth calculation unit for each of the plurality of target variables. And a fourth determination unit that determines the most appropriate explanatory variable to be added to the first prediction model from among the plurality of explanatory variables determined by the above. In this way, the most effective explanatory variable for improving the prediction accuracy of each objective variable can be identified.

また、上で述べた第２算出部が、（ｃ１）複数の目的変数を、算出された誤差に基づくクラスタリングによって複数のグループに分類するようにしてもよい。例えばＫ平均法を利用することで複数の目的変数を適切に分類をすることができるようになる。 The second calculation unit described above may classify (c1) a plurality of objective variables into a plurality of groups by clustering based on the calculated error. For example, a plurality of objective variables can be appropriately classified by using the K average method.

また、上で述べた誤差の代表値が、グループに属する目的変数について算出された誤差の平均値であってもよい。これにより妥当な値を代表値とすることができるようになる。なお、平均値には限られず、例えば中央値などとしてもよい。 Further, the representative value of the error described above may be an average value of errors calculated for the objective variable belonging to the group. As a result, an appropriate value can be used as the representative value. In addition, it is not restricted to an average value, For example, it is good also as a median value.

本実施の形態に係る情報処理方法は、（Ｈ）複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、記憶装置に格納し、（Ｉ）記憶装置に格納されている誤差に基づき、複数の目的変数を複数のグループに分類し、（Ｊ）複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、記憶装置に格納し、（Ｋ）複数のグループの各々について、記憶装置に格納されている代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と代表値との差に基づき、当該グループに属する目的変数の第１の予測モデルに追加する説明変数を決定する処理を含む。 In the information processing method according to the present embodiment, (H) for each of a plurality of objective variables, the value calculated by the first prediction model for predicting the actual value of the objective variable and the value of the objective variable And (I) classifying a plurality of objective variables into a plurality of groups based on the error stored in the storage device, and (J) for each of the plurality of groups A representative value of the error is calculated using the error calculated for the objective variable belonging to the group, stored in the storage device, and (K) a representative value stored in the storage device is predicted for each of the plurality of groups. A plurality of second prediction models for the target variable belonging to the group are generated based on a difference between each of the values calculated by the plurality of generated second prediction models and the representative value. First Including a process of determining the explanatory variable to be added to the prediction model.

なお、上記方法による処理をコンピュータに行わせるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブルディスク、ＣＤ−ＲＯＭ、光磁気ディスク、半導体メモリ、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。尚、中間的な処理結果はメインメモリ等の記憶装置に一時保管される。 A program for causing a computer to perform the processing according to the above method can be created. The program can be a computer-readable storage medium such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, a hard disk, or the like. It is stored in a storage device. The intermediate processing result is temporarily stored in a storage device such as a main memory.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
記憶装置と、
複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、前記記憶装置に格納する第１算出部と、
前記記憶装置に格納されている前記誤差に基づき前記複数の目的変数を複数のグループに分類し、当該複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、前記記憶装置に格納する第２算出部と、
前記複数のグループの各々について、前記記憶装置に格納されている前記代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と前記代表値との差に基づき、当該グループに属する目的変数の前記第１の予測モデルに追加する説明変数を決定する第１決定部と、
を有する情報処理装置。 (Appendix 1)
A storage device;
For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device. 1 calculation unit;
The plurality of objective variables are classified into a plurality of groups based on the errors stored in the storage device, and for each of the plurality of groups, the error calculated using the errors calculated for the objective variables belonging to the group. A second calculating unit that calculates a representative value and stores the representative value in the storage device;
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. A first determination unit that determines an explanatory variable to be added to the first prediction model of an objective variable belonging to the group based on a difference between each of the calculated values and the representative value;
An information processing apparatus.

（付記２）
前記第１決定部が、
複数の説明変数の候補の各々について、当該候補と前記グループに属する目的変数の前記第１の予測モデルに含まれる説明変数とを用いて前記第２の予測モデルを生成し、当該第２の予測モデルによって算出された値と前記代表値との第１の差を算出し、前記記憶装置に格納する第３算出部と、
前記第３算出部により算出された前記第１の差に基づき、前記複数の説明変数の候補の中から、前記グループに属する目的変数の前記第１の予測モデルに追加する説明変数を決定する第２決定部と、
を有する付記１記載の情報処理装置。 (Appendix 2)
The first determination unit is
For each of a plurality of explanatory variable candidates, the second prediction model is generated by using the candidate and the explanatory variable included in the first prediction model of the objective variable belonging to the group, and the second prediction Calculating a first difference between the value calculated by the model and the representative value, and storing the first difference in the storage device;
Determining an explanatory variable to be added to the first prediction model of the objective variable belonging to the group from the plurality of explanatory variable candidates based on the first difference calculated by the third calculating unit; 2 decision part;
The information processing apparatus according to claim 1, further comprising:

（付記３）
前記第３算出部が、
前記複数の説明変数の候補の各々について、当該候補を用いて前記第２の予測モデルを生成し、当該第２の予測モデルによって算出された値と前記代表値との第２の差を算出し、前記記憶装置に格納し、
前記第３算出部により算出された前記第１の差及び前記第２の差に基づき、前記複数の説明変数の候補の中から、前記グループに属する目的変数の前記第１の予測モデルに追加する説明変数を決定する第３決定部
をさらに有する付記２記載の情報処理装置。 (Appendix 3)
The third calculation unit is
For each of the plurality of explanatory variable candidates, the second prediction model is generated using the candidate, and a second difference between the value calculated by the second prediction model and the representative value is calculated. , Store in the storage device,
Based on the first difference and the second difference calculated by the third calculation unit, the candidate is added to the first prediction model of the objective variable belonging to the group from the plurality of explanatory variable candidates. The information processing apparatus according to attachment 2, further comprising a third determination unit that determines an explanatory variable.

（付記４）
前記第１決定部が、
前記第１の予測モデルに追加する説明変数を複数決定し、
前記複数の目的変数の各々について、当該目的変数の前記第１の予測モデルに含まれる説明変数と当該目的変数が属するグループについて前記第１決定部によって決定された複数の説明変数の各々とを用いて、当該目的変数の値を予測するための第３の予測モデルを複数生成し、当該第３の予測モデルによって算出された値の各々と当該目的変数の実際の値との誤差を算出し、前記記憶装置に格納する第４算出部と、
前記複数の目的変数の各々について、前記第４算出部により算出された前記誤差に基づき、前記第１決定部によって決定された複数の説明変数の中から、前記第１の予測モデルに追加する最も適切な説明変数を決定する第４決定部と、
をさらに有する付記１乃至３のいずれか１つ記載の情報処理装置。 (Appendix 4)
The first determination unit is
Determining a plurality of explanatory variables to be added to the first prediction model;
For each of the plurality of objective variables, an explanatory variable included in the first prediction model of the objective variable and each of the plurality of explanatory variables determined by the first determination unit for the group to which the objective variable belongs are used. A plurality of third prediction models for predicting the value of the target variable, and calculating an error between each of the values calculated by the third prediction model and the actual value of the target variable, A fourth calculation unit stored in the storage device;
For each of the plurality of objective variables, most of the plurality of explanatory variables determined by the first determination unit based on the error calculated by the fourth calculation unit is added to the first prediction model. A fourth determination unit for determining an appropriate explanatory variable;
The information processing apparatus according to any one of appendices 1 to 3, further comprising:

（付記５）
前記第２算出部が、
前記複数の目的変数を、算出された前記誤差に基づくクラスタリングによって複数のグループに分類する
付記１乃至４のいずれか１つ記載の情報処理装置。 (Appendix 5)
The second calculation unit is
The information processing apparatus according to any one of attachments 1 to 4, wherein the plurality of objective variables are classified into a plurality of groups by clustering based on the calculated error.

（付記６）
前記誤差の代表値が、前記グループに属する目的変数について算出された前記誤差の平均値である
付記１乃至５いずれか１つ記載の情報処理装置。 (Appendix 6)
The information processing apparatus according to any one of attachments 1 to 5, wherein the representative value of the error is an average value of the errors calculated for the objective variable belonging to the group.

（付記７）
複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、記憶装置に格納し、
前記記憶装置に格納されている前記誤差に基づき、前記複数の目的変数を複数のグループに分類し、
前記複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、前記記憶装置に格納し、
前記複数のグループの各々について、前記記憶装置に格納されている前記代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と前記代表値との差に基づき、当該グループに属する目的変数の前記第１の予測モデルに追加する説明変数を決定する、
処理をコンピュータが実行する情報処理方法。 (Appendix 7)
For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device.
Classifying the plurality of objective variables into a plurality of groups based on the error stored in the storage device;
For each of the plurality of groups, a representative value of the error is calculated using the error calculated for the objective variable belonging to the group, and is stored in the storage device.
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. Based on a difference between each calculated value and the representative value, an explanatory variable to be added to the first prediction model of the objective variable belonging to the group is determined.
An information processing method in which processing is executed by a computer.

（付記８）
複数の目的変数の各々について、当該目的変数の実際の値と当該目的変数の値を予測するための第１の予測モデルによって算出された値との誤差を算出し、記憶装置に格納し、
前記記憶装置に格納されている前記誤差に基づき、前記複数の目的変数を複数のグループに分類し、
前記複数のグループの各々について、当該グループに属する目的変数について算出された誤差を用いて当該誤差の代表値を算出し、前記記憶装置に格納し、
前記複数のグループの各々について、前記記憶装置に格納されている前記代表値を予測するための第２の予測モデルを説明変数を変えつつ複数生成し、生成された複数の第２の予測モデルによって算出された値の各々と前記代表値との差に基づき、当該グループに属する目的変数の前記第１の予測モデルに追加する説明変数を決定する、
処理をコンピュータに実行させるためのプログラム。 (Appendix 8)
For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device.
Classifying the plurality of objective variables into a plurality of groups based on the error stored in the storage device;
For each of the plurality of groups, a representative value of the error is calculated using the error calculated for the objective variable belonging to the group, and is stored in the storage device.
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. Based on a difference between each calculated value and the representative value, an explanatory variable to be added to the first prediction model of the objective variable belonging to the group is determined.
A program that causes a computer to execute processing.

１情報処理装置２グルーピング処理部
３グルーピング結果格納部４候補抽出部
５学習データ格納部６候補格納部
７決定部８決定結果格納部
９出力部
２１第１モデル生成部２２第１モデル格納部
２３第１誤差算出部２４第１誤差データ格納部
２５グループ生成部４０１第２モデル生成部
４０２第２モデル格納部４０３第３モデル格納部
４０４第２誤差算出部４０５第２誤差データ格納部
４０６第３誤差データ格納部４０７第１評価値算出部
４０８第１評価値格納部４０９第２評価値格納部
４１０抽出部４１１第１抽出結果格納部
４１２第２抽出結果格納部４１３第１特定部
７１第３モデル生成部７２第４モデル格納部
７３第３誤差算出部７４第４誤差データ格納部
７５第２評価値算出部７６第３評価値格納部
７７第２特定部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Grouping process part 3 Grouping result storage part 4 Candidate extraction part 5 Learning data storage part 6 Candidate storage part 7 Determination part 8 Determination result storage part 9 Output part 21 1st model production | generation part 22 1st model storage part 23 First error calculation unit 24 First error data storage unit 25 Group generation unit 401 Second model generation unit 402 Second model storage unit 403 Third model storage unit 404 Second error calculation unit 405 Second error data storage unit 406 Third Error data storage unit 407 First evaluation value calculation unit 408 First evaluation value storage unit 409 Second evaluation value storage unit 410 Extraction unit 411 First extraction result storage unit 412 Second extraction result storage unit 413 First identification unit 71 Third Model generation unit 72 Fourth model storage unit 73 Third error calculation unit 74 Fourth error data storage unit 75 Second evaluation value calculation unit 76 Third evaluation value storage unit 77 Second Part

Claims

A storage device;
For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device. 1 calculation unit;
The plurality of objective variables are classified into a plurality of groups based on the errors stored in the storage device, and for each of the plurality of groups, the error calculated using the errors calculated for the objective variables belonging to the group. A second calculating unit that calculates a representative value and stores the representative value in the storage device;
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. A first determination unit that determines an explanatory variable to be added to the first prediction model of an objective variable belonging to the group based on a difference between each of the calculated values and the representative value;
An information processing apparatus.

The first determination unit is
For each of a plurality of explanatory variable candidates, the second prediction model is generated by using the candidate and the explanatory variable included in the first prediction model of the objective variable belonging to the group, and the second prediction Calculating a first difference between the value calculated by the model and the representative value, and storing the first difference in the storage device;
Determining an explanatory variable to be added to the first prediction model of the objective variable belonging to the group from the plurality of explanatory variable candidates based on the first difference calculated by the third calculating unit; 2 decision part;
The information processing apparatus according to claim 1.

The third calculation unit is
For each of the plurality of explanatory variable candidates, the second prediction model is generated using the candidate, and a second difference between the value calculated by the second prediction model and the representative value is calculated. , Store in the storage device,
Based on the first difference and the second difference calculated by the third calculation unit, the candidate is added to the first prediction model of the objective variable belonging to the group from the plurality of explanatory variable candidates. The information processing apparatus according to claim 2, further comprising a third determination unit that determines an explanatory variable.

The first determination unit is
Determining a plurality of explanatory variables to be added to the first prediction model;
For each of the plurality of objective variables, an explanatory variable included in the first prediction model of the objective variable and each of the plurality of explanatory variables determined by the first determination unit for the group to which the objective variable belongs are used. A plurality of third prediction models for predicting the value of the target variable, and calculating an error between each of the values calculated by the third prediction model and the actual value of the target variable, A fourth calculation unit stored in the storage device;
For each of the plurality of objective variables, most of the plurality of explanatory variables determined by the first determination unit based on the error calculated by the fourth calculation unit is added to the first prediction model. A fourth determination unit for determining an appropriate explanatory variable;
The information processing apparatus according to any one of claims 1 to 3, further comprising:

For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device.
Classifying the plurality of objective variables into a plurality of groups based on the error stored in the storage device;
For each of the plurality of groups, a representative value of the error is calculated using the error calculated for the objective variable belonging to the group, and is stored in the storage device.
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. Based on a difference between each calculated value and the representative value, an explanatory variable to be added to the first prediction model of the objective variable belonging to the group is determined.
An information processing method in which processing is executed by a computer.

For each of the plurality of objective variables, an error between the actual value of the objective variable and the value calculated by the first prediction model for predicting the value of the objective variable is calculated and stored in the storage device.
Classifying the plurality of objective variables into a plurality of groups based on the error stored in the storage device;
For each of the plurality of groups, a representative value of the error is calculated using the error calculated for the objective variable belonging to the group, and is stored in the storage device.
For each of the plurality of groups, a plurality of second prediction models for predicting the representative value stored in the storage device are generated while changing the explanatory variables, and the generated plurality of second prediction models are used. Based on a difference between each calculated value and the representative value, an explanatory variable to be added to the first prediction model of the objective variable belonging to the group is determined.
A program that causes a computer to execute processing.