JP7795908B2

JP7795908B2 - Data analysis device and data analysis method

Info

Publication number: JP7795908B2
Application number: JP2021211994A
Authority: JP
Inventors: 大河能見; 彰信瀬里
Original assignee: Keyence Corp
Current assignee: Keyence Corp
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2026-01-08
Anticipated expiration: 2041-12-27
Also published as: JP2023096330A

Description

本開示は、データ分析装置及びデータ分析方法に関する。 This disclosure relates to a data analysis device and a data analysis method.

データ分析には様々な手法があり、それぞれの手法に適したデータ形式・ツールは異なっている。例えば、企業が持っている様々なデータをＢＩ（ビジネスインテリジェンス）ツールで集計・可視化する際には、一般的にスタースキーマと呼ばれる形式で集計前のデータを保持することが推奨される一方で、機械学習を用いた分析を行う際には、データを１枚の表形式に事前に集計・結合する必要がある。この集計・結合は特徴抽出と呼ばれ、機械学習を用いた分析に長い時間を要する原因となっていることも知られている。 There are various methods for data analysis, and each method is suited to different data formats and tools. For example, when aggregating and visualizing the various data held by a company using a BI (business intelligence) tool, it is generally recommended to store the data before aggregation in a format known as a star schema. However, when performing analysis using machine learning, the data must be aggregated and combined in advance into a single table format. This aggregation and combination is called feature extraction, and is known to be the reason why analysis using machine learning takes a long time.

さらに、機械学習を用いる際には学習時と、学習後の予測時とで異なる集計期間のデータを利用することが普通であるため、学習用と予測用のデータをそれぞれ用意することも必要となる。 Furthermore, when using machine learning, it is common to use data from different collection periods when learning and when making predictions after learning, so it is also necessary to prepare separate data for learning and prediction.

そのため、通常、データ分析者は例えばＳＱＬのような高度なプログラミングを用いて、分析の目的ごとにデータ変換処理を個別に設計・実装し、ツールを適切に使い分けることで所望の分析を実現しているという現状がある。 As a result, data analysts typically use advanced programming languages such as SQL to design and implement data conversion processes individually for each analytical purpose, and use appropriate tools to achieve the desired analysis.

また、分析におけるデータ変換処理の手間を軽減するために、複数の入力データから自動的に結合・集計処理を行うことで目的変数や特徴量を自動的に生成する技術が知られている（例えば特許文献１参照）。 In addition, to reduce the effort required for data conversion processing in analysis, a technology is known that automatically generates target variables and feature quantities by automatically combining and aggregating multiple input data sets (see, for example, Patent Document 1).

特開２０２０－１３５０５４号公報Japanese Patent Application Laid-Open No. 2020-135054

ところで、例えば営業活動の支援を目的に営業活動・販売データを分析する際には、以下に挙げるようなデータ活用用途が考えられる。すなわち、営業活動を支援する現場におけるデータ活用用途としては、営業指標を帳票で定常的にモニタリングする、帳票の中から深堀したい箇所を特定し、複数の軸を組み合わせることでより細かい集計を行う、帳票で発見されたビジネス課題に対して、機械学習を用いて要因となっている事象を特定する、機械学習による予測を用いて、改善見込みの高い箇所を特定し、施策を実行する、施策の実行状況を帳票で確認する等がある。 When analyzing sales activity and sales data to support sales activities, for example, the following data utilization applications are possible. In other words, data utilization applications in the field to support sales activities include regularly monitoring sales indicators in reports, identifying areas to dig deeper into in reports and combining multiple axes to perform more detailed aggregations, using machine learning to identify the underlying causes of business issues discovered in reports, using machine learning predictions to identify areas with high potential for improvement and implement measures, and checking the implementation status of measures in reports.

ところが、上述したデータ活用用途は、それぞれが独立したものではなく、繰り返し実施される改善プロセスの一部である。一般に単一の分析手法で全ての用途をカバーすることはできないため、この改善プロセスを実現するためには、各ステップを実現するための分析手法の特定、ツールの選定、データ変換処理の設計・実装と、ステップ間でのデータの連携方法の検討などが必要となり、多大な工数を必要とする。 However, the data utilization applications mentioned above are not independent of each other, but are part of an iterative improvement process. Generally, a single analysis method cannot cover all applications, so achieving this improvement process requires identifying the analysis method for each step, selecting tools, designing and implementing data conversion processes, and considering how to link data between steps, which requires a significant amount of work.

また、あるステップの分析で有用な特徴量が発見されたとしても、それを別のステップの分析で流用するためには、各ステップにおける分析に必要な変換処理がそれぞれ異なるために、特徴量の集計・変換方法が各ステップで最適になるように再調整する必要があり、各分析で得られた知見を他のステップに流用しにくいという問題があった。 Furthermore, even if useful features are discovered in the analysis of one step, in order to reuse them in the analysis of another step, the conversion processes required for each step differ, so the feature aggregation and conversion methods must be readjusted to be optimal for each step, making it difficult to reuse the knowledge gained from each analysis in other steps.

また、一般にビジネス課題を解決するためには機械学習や集計など、複数の種類の分析を組み合わせて用いる必要があり、それぞれの分析で個別に変換処理を行う必要があった。 In addition, solving business problems generally requires combining multiple types of analysis, such as machine learning and aggregation, and each analysis requires separate conversion processing.

また、機械学習を用いてモデルを学習する際、目的変数の集計期間における情報の一部が特徴量に意図せず含まれてしまうリークと呼ばれる現象が知られている。リークを防ぐためには特徴量と目的変数に用いるデータの期間が互いに重複しないように調整する必要がある。 In addition, when training a model using machine learning, a phenomenon known as leaking can occur, in which some of the information from the target variable aggregation period is unintentionally included in the features. To prevent leaking, it is necessary to adjust the data periods used for the features and target variables so that they do not overlap.

一方、学習したモデルで予測を行う際には、実際に予測を行いたい時点を基準とした特徴量の再計算が必要であり、学習時と予測時で異なる値を計算する必要がある。さらに、帳票のような集計用途においては、多くの場合最新の数値をモニタリングしたいことから、最新の日付を基準にした値を計算する必要があることが多い。 On the other hand, when making predictions using a trained model, it is necessary to recalculate the features based on the time at which the prediction is actually desired, and different values must be calculated at the time of training and the time of prediction. Furthermore, for aggregation purposes such as reports, it is often necessary to monitor the most recent figures, and so it is often necessary to calculate values based on the most recent date.

つまり、モデルの学習時、予測時、及び集計用途においては、それぞれ用途に応じて適切な集計の期間が異なるため、各分析用の変換処理で得られた特徴量の値は、別の分析にそのまま流用することはできず、ＳＱＬのようなプログラミングの知識がある専門家を介して、特徴量の集計期間を調整するといった作業が必要であり、工数がかかる上に間違いも発生しやすいという問題があった。 In other words, the appropriate aggregation period differs depending on the application: model training, prediction, and aggregation. Therefore, the feature values obtained through the conversion process for each analysis cannot be directly reused for another analysis. Instead, an expert with knowledge of programming languages such as SQL must be involved to adjust the feature aggregation period, which is time-consuming and prone to errors.

本開示は、かかる点に鑑みたものであり、その目的とするところは、高度なプログラミング技術を持った専門家を介さなくても、共通の入力データに基づいて様々な種類の分析が行えるようにすることにある。 The present disclosure was made in light of these points, and its purpose is to enable various types of analysis to be performed based on common input data without the need for experts with advanced programming skills.

上記目的を達成するために、本開示の一態様に係るデータ分析装置は、複数の特徴量を有する複数の表形式データを入力するためのデータ入力部と、前記データ入力部に入力された複数の前記表形式データの対応関係を定めたリレーション情報の設定を受け付けて、分析対象となるデータモデルを設定するデータモデル設定部と、前記データモデル設定部により設定されたデータモデルを分析設定情報に基づいて調整するデータ調整部と、前記データモデル設定部により設定されたデータモデルに対して、第１の分析を実行し、第１の分析結果を生成する第１分析部と、前記データ調整部により調整されたデータモデルに対して第２の分析を実行し、第２の分析結果を生成する第２分析部と、を備えている。 To achieve the above object, a data analysis device according to one aspect of the present disclosure includes a data input unit for inputting multiple tabular data sets each having multiple feature quantities; a data model setting unit that accepts setting of relation information defining corresponding relationships between the multiple tabular data sets input to the data input unit and sets a data model to be analyzed; a data adjustment unit that adjusts the data model set by the data model setting unit based on analysis setting information; a first analysis unit that performs a first analysis on the data model set by the data model setting unit and generates first analysis results; and a second analysis unit that performs a second analysis on the data model adjusted by the data adjustment unit and generates second analysis results.

この構成によれば、複数の表形式データが入力され、かつ、当該表形式データの対応関係を定めたリレーション情報の設定を受け付けると、分析対象となるデータモデルがデータモデル設定部により設定される。データモデルが設定されると、例えば使用者により設定された分析設定情報に基づいてデータモデルがデータ調整部により調整される。第１分析部は、データモデル設定部により設定されたデータモデルに対して、第１の分析を実行し、第１の分析結果を生成する一方、第２分析部は、データ調整部により調整されたデータモデルに対して第２の分析を実行し、第２の分析結果を生成する。したがって、例えば特徴量の集計期間を調整するといった作業を使用者に強いることなく、異なる種類の分析を行うことが可能になる。 With this configuration, when multiple tabular data sets are input and relation information defining the correspondence between the tabular data sets is accepted, the data model setting unit sets the data model to be analyzed. Once the data model has been set, the data adjustment unit adjusts the data model based on analysis setting information set by, for example, a user. The first analysis unit performs a first analysis on the data model set by the data model setting unit and generates first analysis results, while the second analysis unit performs a second analysis on the data model adjusted by the data adjustment unit and generates second analysis results. This makes it possible to perform different types of analysis without forcing the user to perform tasks such as adjusting the aggregation period for feature quantities.

他の態様では、前記第１分析部及び前記第２分析部の少なくともいずれか一方の分析結果に含まれる新たな特徴量を、次回の分析対象となるデータモデルに付加することができるので、使用者の負担を少なくすることができる。 In another aspect, new features contained in the analysis results of at least one of the first analysis unit and the second analysis unit can be added to the data model to be analyzed next, thereby reducing the burden on the user.

他の態様では、複数の表形式データの中から一部のデータを抽出するためのセグメントの設定を受け付けて、分析対象となるデータモデルを設定することができ、この場合、第１分析部及び第２分析部の少なくともいずれか一方の分析結果に含まれる新たなセグメントを、次回の分析対象となるデータモデルに付加することができる。 In another aspect, the data model to be analyzed can be set by accepting segment settings for extracting a portion of data from multiple tabular data. In this case, new segments included in the analysis results of at least one of the first and second analysis units can be added to the data model to be analyzed next time.

他の態様では、第１分析部が目的変数の指定を受け付けると、指定された目的変数と関連度が大きい特徴量を抽出するとともに、全データの目的変数の平均値と比較して、前記目的変数の平均値が相対的に高くなる又は低くなるセグメントを抽出するための要因分析を実行する。第１分析部が実行した要因分析の結果に基づいて、目的変数との関連度が大きい特徴量を次回の分析対象となるデータモデルに付加することができる。 In another aspect, when the first analysis unit receives the specification of a target variable, it extracts features highly correlated with the specified target variable and performs a factor analysis to extract segments where the average value of the target variable is relatively high or low compared to the average value of the target variable for all data. Based on the results of the factor analysis performed by the first analysis unit, it can add features highly correlated with the target variable to the data model to be analyzed next.

以上説明したように、複数の表形式データ及びリレーション情報に基づいて設定されたデータモデルに対して第１の分析を実行し、また、設定されたデータモデルを分析設定情報に基づいて調整し、調整後のデータモデルに対して第１の分析とは異なる第２の分析を実行することができる。したがって、高度なプログラミング技術を持った専門家を介さなくても、共通の入力データに基づいて様々な種類の分析を行うことができる。 As explained above, a first analysis can be performed on a data model configured based on multiple tabular data and relationship information, and the configured data model can be adjusted based on analysis configuration information, and a second analysis different from the first analysis can be performed on the adjusted data model. Therefore, various types of analysis can be performed based on common input data without the need for an expert with advanced programming skills.

本発明の実施形態に係るデータ分析装置の概略構成を示す図である。1 is a diagram illustrating a schematic configuration of a data analysis apparatus according to an embodiment of the present invention. データ分析装置のブロック図である。FIG. 2 is a block diagram of a data analysis device. 分析の目的によって特徴量の計算に使われるデータ期間が異なる例を示す図である。FIG. 10 is a diagram illustrating an example in which the data period used to calculate the feature amount differs depending on the purpose of analysis. データ分析の手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a data analysis procedure. データ入力部、データモデル設定部、調整部、分析部及び出力部の詳細を示すブロック図である。FIG. 2 is a block diagram showing details of a data input unit, a data model setting unit, an adjustment unit, an analysis unit, and an output unit. 表示部に表示可能な画面構成の例を示す図である。10A and 10B are diagrams illustrating examples of screen configurations that can be displayed on a display unit. 分析対象データの例を示す図である。FIG. 10 is a diagram illustrating an example of data to be analyzed. 分析対象データ画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of an analysis target data screen. リレーション情報の設定例を示す図である。FIG. 10 is a diagram illustrating an example of setting relation information. ３種類の多重度を説明する図である。FIG. 10 is a diagram illustrating three types of multiplicity. リレーション設定画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a relation setting screen. 分析一覧画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of an analysis list screen. 帳票分析の設定画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a setting screen for a report analysis. 帳票分析用の表に表示される値を計算する手法を説明する図である。FIG. 10 is a diagram illustrating a method for calculating values to be displayed in a table for form analysis. 帳票として設定した値を示す図である。FIG. 10 is a diagram showing values set as a form. 帳票の出力画面の例を示す図である。FIG. 10 is a diagram illustrating an example of an output screen of a form. ツリー分析画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a tree analysis screen. 帳票の２箇所が選択された場合の出力画面の例を示す図である。FIG. 10 is a diagram showing an example of an output screen when two locations on a form are selected. 帳票分析からツリー分析への変換規則の例を示す図である。FIG. 10 is a diagram illustrating an example of a conversion rule from form analysis to tree analysis. 帳票分析から開始されたツリー分析の状態を示す図１７相当図である。FIG. 18 is a diagram equivalent to FIG. 17 showing the state of tree analysis started from form analysis. ツリー分析から帳票分析を開始するきっかけとなり得るツリー分析画面の一例を示す図である。FIG. 10 is a diagram showing an example of a tree analysis screen that can trigger starting of form analysis from tree analysis. 紹介欄が選択されたツリー分析画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a tree analysis screen in which a referral section is selected. ツリー分析から帳票分析への変換規則の例を示す図である。FIG. 10 is a diagram illustrating an example of a conversion rule from tree analysis to form analysis. ツリー分析から開始された帳票分析を示す出力画面の例を示す図である。FIG. 10 is a diagram showing an example of an output screen showing a form analysis started from a tree analysis. 帳票分析の情報をツリー分析の表示領域に埋め込んで表示させる画面の例を示す図である。FIG. 10 is a diagram showing an example of a screen on which information on form analysis is embedded and displayed in the display area of tree analysis. 要因分析の設定画面の例を示す図である。FIG. 10 is a diagram illustrating an example of a setting screen for factor analysis. 要因分析時の処理手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for factor analysis. セグメント出力画面の一例を示す図である。FIG. 10 is a diagram showing an example of a segment output screen. 特徴量を出力する場合の図２８相当図である。FIG. 28 corresponds to FIG. 28 when the feature amount is output. 特徴量の出力手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a procedure for outputting a feature amount. 更新後の分析対象データ画面の一例を示す図である。FIG. 10 is a diagram showing an example of an analysis target data screen after updating. 予測分析の設定画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a setting screen for predictive analysis. スコアリング設定画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a scoring setting screen. ルールへの該当数とスコアの対応表の一例を示す図である。FIG. 10 is a diagram illustrating an example of a correspondence table between the number of occurrences of a rule and a score. 学習用データの基準日と予測用データの基準日とが異なる場合のデータを示す図である。FIG. 10 is a diagram illustrating data in a case where the reference date of the learning data and the reference date of the prediction data are different. 予測値表示画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a predicted value display screen. ＲＯＩを計算する場合の図３６相当図である。FIG. 37 is a diagram equivalent to FIG. 36 when calculating ROI. 要因分析から予測分析への連携を説明する図である。FIG. 10 is a diagram illustrating the link from factor analysis to predictive analysis. セグメントの保存画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a segment saving screen. 条件式の設定画面の一例を示す図である。FIG. 10 is a diagram showing an example of a setting screen for a conditional expression. 予測分析から生成されたセグメントを表形式で照合する場合を説明する図である。FIG. 10 is a diagram illustrating a case where segments generated from predictive analysis are collated in a tabular format. 帳票分析の更新を行う際の設定画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a setting screen when updating a report analysis.

以下、本発明の実施形態を図面に基づいて詳細に説明する。尚、以下の好ましい実施形態の説明は、本質的に例示に過ぎず、本発明、その適用物或いはその用途を制限することを意図するものではない。 Embodiments of the present invention will now be described in detail with reference to the accompanying drawings. Note that the following description of the preferred embodiment is merely exemplary in nature and is not intended to limit the present invention, its applications, or its uses.

図１は、本発明の実施形態に係るデータ分析装置１の概略構成を示す図であり、図２は、データ分析装置１のブロック図である。データ分析装置１は、各種分析対象データを分析するための装置であり、このデータ分析装置１を使用することで本発明に係るデータ分析方法を実行することが可能である。 Figure 1 is a diagram showing the general configuration of a data analysis device 1 according to an embodiment of the present invention, and Figure 2 is a block diagram of the data analysis device 1. The data analysis device 1 is a device for analyzing various types of analysis target data, and the data analysis method according to the present invention can be executed using this data analysis device 1.

データ分析装置１の構成を説明する前に、実際のデータ分析の一例について図３に基づいて説明する。分析１は、２０１８年１～６月の目的変数の実績から２０１８年７～１２月の目的変数を予測する分析であり、特徴量には１年間のデータを利用する。分析２は、２０１９年１～３月の目的変数の実績から２０１９年４～６月の目的変数を予測する分析であり、特徴量には６か月分のデータを利用する。分析３は、帳票による可視化であり、特徴量は最新データまでの期間を利用する。また、分析１、２では、モデルの学習が必要であり、この学習で利用する期間と、予測で利用する期間とは異なっている。このように、モデルの学習時、予測時、および集計用途においては、それぞれ用途に応じて適切な集計の期間が異なることがあり、各分析用の変換処理で得られた特徴量の値は、別の分析にそのまま流用することはできないケースがある。本実施形態に係るデータ分析装置１は、このようなケースであっても、ＳＱＬのようなプログラミングの知識がある専門家を介することなく、共通のデータから様々な分析を可能にするものである。以下、データ分析装置１の構成について具体的に説明する。 Before explaining the configuration of the data analysis device 1, an example of actual data analysis will be described with reference to Figure 3. Analysis 1 predicts the objective variable from July to December 2018 based on the actual objective variable from January to June 2018, using one year of data for features. Analysis 2 predicts the objective variable from April to June 2019 based on the actual objective variable from January to March 2019, using six months of data for features. Analysis 3 visualizes data using a report, and the feature values use the period up to the most recent data. Furthermore, analyses 1 and 2 require model training, and the period used for this training differs from the period used for prediction. As such, the appropriate aggregation period for model training, prediction, and aggregation purposes may differ depending on the application. In some cases, the feature values obtained through the conversion process for each analysis cannot be directly reused for another analysis. The data analysis device 1 according to this embodiment enables various analyses from common data, even in such cases, without the intervention of an expert with programming knowledge such as SQL. The configuration of the data analysis device 1 is described in detail below.

（データ分析装置１の全体構成）
図１及び図２に示すように、データ分析装置１は、装置本体２と、モニタ３と、キーボード４と、マウス５とを備えており、モニタ３、キーボード４及びマウス５は、装置本体２に接続されている。装置本体２とモニタ３とは一体化されていてもよいし、装置本体２の一部または装置本体２が実行する機能の一部がモニタ３に内蔵されていてもよい。データ分析装置１は、通信モジュール（図示せず）を内蔵しており、外部との通信が可能に構成されている。これにより、例えばインターネット回線等を介して外部サーバーからのデータのダウンロード等が可能になる。通信形態は、無線通信であってもよいし、有線通信であってもよい。キーボード４及びマウス５は、データ分析装置１を操作する操作手段の一例であるとともに、各種情報を入力する入力手段の一例、選択操作を行う選択手段の一例でもある。キーボード４及びマウス５に加えて、またはキーボード４及びマウス５に変えて、タッチパネル式の入力装置、音声入力装置、ペン型の入力装置等を使用することもできる。 (Overall configuration of data analysis device 1)
As shown in FIGS. 1 and 2 , the data analysis device 1 includes a device main body 2, a monitor 3, a keyboard 4, and a mouse 5. The monitor 3, keyboard 4, and mouse 5 are connected to the device main body 2. The device main body 2 and the monitor 3 may be integrated, or part of the device main body 2 or part of the functions executed by the device main body 2 may be built into the monitor 3. The data analysis device 1 includes a built-in communication module (not shown) and is configured to enable communication with the outside. This enables, for example, downloading data from an external server via an internet line. The communication may be wireless or wired. The keyboard 4 and mouse 5 are examples of operating means for operating the data analysis device 1, as well as examples of input means for inputting various information and selection means for performing selection operations. In addition to or instead of the keyboard 4 and mouse 5, a touch panel input device, a voice input device, a pen-type input device, or the like may be used.

例えば汎用のパーソナルコンピュータに、後述する制御及び処理等を実行可能なプログラムをインストールすることによってデータ分析装置１とすることができ、また、前記プログラムがインストールされた専用のハードウェアでデータ分析装置１を構成することもできる。たとえば、使用者のパーソナルコンピュータに前記プログラムを直接インストールすることで、当該パーソナルコンピュータをデータ分析装置１として利用する形態、サーバーに前記プログラムをインストールしてデータ分析装置１を構築し、各使用者は各自のパーソナルコンピュータのブラウザからネットワーク経由でデータ分析装置１にアクセスして使用する形態、クラウド上に置かれており、各使用者は各自のパーソナルコンピュータのブラウザからクラウド上のデータ分析装置１にアクセスする形態のいずれの形態であってもよい。また、後述する制御及び処理等の一部を使用者のパーソナルコンピュータで実行し、残りを他者のパーソナルコンピュータないしクラウド上で実行可能にしてもよい。つまり、データ分析装置１が実行する制御及び処理等の全てが同一のパーソナルコンピュータ上で行われる必要はなく、結果的に同様な作用効果を奏するシステムは、データ分析装置１である。また、図４に一例として示すデータ分析方法についても同様に、全てのステップＳ１～Ｓ８が同一のパーソナルコンピュータ上で行われる必要はない。尚、本実施形態では、データ分析装置１を使用して営業活動データを分析し、その分析結果を通じて、商談件数や成約率といった営業指標のモニタリングと深堀、変化が発生した際の要因分析、改善のための見込み会社リストの作成、施策の進捗監視、という一連の活動を実施する例を示すが、データ分析装置１は営業活動の支援以外の用途に使用することも可能である。 For example, the data analysis device 1 can be created by installing a program capable of executing the control and processing described below on a general-purpose personal computer. Alternatively, the data analysis device 1 can be configured with dedicated hardware on which the program is installed. For example, the program can be installed directly on a user's personal computer, allowing the personal computer to function as the data analysis device 1; the program can be installed on a server, allowing users to access the data analysis device 1 via a network from their personal computer's browser; or the data analysis device 1 can be placed on a cloud, allowing users to access the cloud-based data analysis device 1 from their personal computer's browser. Furthermore, some of the control and processing described below can be performed on a user's personal computer, while the rest can be performed on another person's personal computer or on the cloud. In other words, the control and processing performed by the data analysis device 1 do not all need to be performed on the same personal computer; any system that ultimately achieves similar effects can be called the data analysis device 1. Similarly, in the data analysis method shown as an example in Figure 4, steps S1 through S8 do not all need to be performed on the same personal computer. In this embodiment, the data analysis device 1 is used to analyze sales activity data, and the analysis results are used to carry out a series of activities, including monitoring and in-depth analysis of sales indicators such as the number of negotiations and the success rate, analyzing the causes of changes when they occur, creating a list of potential companies for improvement, and monitoring the progress of measures. However, the data analysis device 1 can also be used for purposes other than supporting sales activities.

（モニタ３の構成）
図１に示すモニタ３は、例えば有機ＥＬディスプレイや液晶ディスプレイ等からなるものであり、それ単体で表示部と呼ぶこともできるし、モニタ３と、図２に示す表示制御部３ａとを合わせて表示部と呼ぶこともできる。表示制御部３ａは、モニタ３に内蔵されていてもよいし、装置本体２に内蔵されていてもよい。表示制御部３ａは、モニタ３に対して画像を表示させる表示用ＤＳＰなどを含んでいる。表示制御部３ａには、画像を表示させる際に画像データを一時記憶するＶＲＡＭなどのビデオメモリが含まれていてもよい。表示制御部３ａは、後述する主制御部１１のＣＰＵ１１ａから送られてきた表示指令（表示コマンド）に基づいて、モニタ３に対して所定の画像を表示させるための制御信号を送信する。たとえば、後述するような各種ユーザーインターフェースの他、アイコン、キーボード４やマウス５を用いた使用者の操作内容をモニタ３に表示させるための制御信号も送信する。また、マウス５で操作可能なポインタ等もモニタ３に表示することができるようになっている。モニタ３をタッチ操作パネル型モニタとすることも可能であり、こうすることで、モニタ３に各種情報の入力機能、データ分析装置１の操作機能、選択操作機能を持たせることができる。 (Configuration of monitor 3)
The monitor 3 shown in FIG. 1 may be, for example, an organic electroluminescence (EL) display or a liquid crystal display (LCD). It may be referred to as a display unit by itself, or the monitor 3 and the display control unit 3a shown in FIG. 2 may be collectively referred to as a display unit. The display control unit 3a may be built into the monitor 3 or into the device main body 2. The display control unit 3a includes a display DSP for displaying images on the monitor 3. The display control unit 3a may also include a video memory such as a VRAM for temporarily storing image data when displaying the images. The display control unit 3a transmits control signals to the monitor 3 to display a predetermined image based on a display command sent from the CPU 11a of the main control unit 11 (described below). For example, the display control unit 3a transmits control signals to display on the monitor 3 various user interfaces (described below), as well as icons and user operations using the keyboard 4 and mouse 5. A pointer operable with the mouse 5 can also be displayed on the monitor 3. The monitor 3 may also be a touch panel monitor, which allows the monitor 3 to have various information input functions, operation functions for the data analysis device 1, and selection operations.

（装置本体２の全体構成）
図１に示す装置本体２は、制御ユニット１０と、記憶部３０とを備えている。記憶部３０は、ハードディスクドライブ、ソリッドステートドライブ（ＳＳＤ）等で構成されている。記憶部３０は、制御ユニット１０に接続されており、制御ユニット１０によって制御され、各種データを保存しておくことができるとともに、保存しておいたデータを読み出すこともできるようになっている。記憶部３０の一部または全部がクラウド上に存在していてもよい。 (Overall configuration of device main body 2)
The device main body 2 shown in Fig. 1 includes a control unit 10 and a storage unit 30. The storage unit 30 is configured with a hard disk drive, a solid state drive (SSD), or the like. The storage unit 30 is connected to the control unit 10 and controlled by the control unit 10, and is capable of storing various data and also capable of reading out the stored data. Part or all of the storage unit 30 may reside on the cloud.

（制御ユニット１０）
制御ユニット１０は、具体的に図示しないが、ＭＰＵ、システムＬＳＩ、ＤＳＰや専用ハードウェア等で構成することができる。制御ユニット１０は、後述するように様々な機能を搭載しているが、これらは論理回路によって実現されていてもよいし、ソフトウェアを実行することによって実現されていてもよい。 (Control unit 10)
Although not specifically shown, the control unit 10 can be configured with an MPU, a system LSI, a DSP, dedicated hardware, etc. The control unit 10 is equipped with various functions as will be described later, which may be realized by logic circuits or by executing software.

図２に示すように、制御ユニット１０は、主制御部１１と、データ入力部１２ａ及びデータモデル設定部１２ｂと、第１調整部１３と、第２調整部１４と、第１分析部１５と、第２分析部１６と、出力部１８と、第３分析部１９と、第４分析部２０とを備えている。図５は、データ入力部１２ａ、データモデル設定部１２ｂ、第１調整部１３、第２調整部１４、第１分析部１５、第２分析部１６及び出力部１８の詳細、情報の送受信経路について示している。また、図６は、モニタ３に表示可能な画面構成の例を示している。図６の各画面は、いわゆるユーザーインターフェースであり、各種情報の使用者への提示や、使用者からの各種情報等の入力、設定、選択等の操作がなされる画面である。主制御部１１からの信号に基づいて表示制御部３ａが各画面を生成し、モニタ３に表示させることができるが、各画面の生成は、第１分析部１５や、第２分析部１６等の分析部が行ってもよい。 As shown in FIG. 2, the control unit 10 includes a main control unit 11, a data input unit 12a, a data model setting unit 12b, a first adjustment unit 13, a second adjustment unit 14, a first analysis unit 15, a second analysis unit 16, an output unit 18, a third analysis unit 19, and a fourth analysis unit 20. FIG. 5 shows details of the data input unit 12a, the data model setting unit 12b, the first adjustment unit 13, the second adjustment unit 14, the first analysis unit 15, the second analysis unit 16, and the output unit 18, as well as the information transmission and reception paths. Furthermore, FIG. 6 shows an example of a screen configuration that can be displayed on the monitor 3. Each screen in FIG. 6 is a so-called user interface, and is a screen on which various information is presented to the user and on which the user inputs, sets, selects, and performs other operations such as inputting, setting, and selecting various information. The display control unit 3a can generate each screen based on a signal from the main control unit 11 and display it on the monitor 3, but each screen may also be generated by an analysis unit such as the first analysis unit 15 or the second analysis unit 16.

詳細については後述するが、図６に示すように、メニュー部には、ワークフロー、分析対象データ、リレーション、セグメント、分析が含まれている。ワークフローはワークフロー画面、分析対象データは分析対象データ画面、リレーションはリレーション設定画面、セグメントはセグメント一覧画面、分析は分析一覧画面に移行する。ワークフロー画面では、データ編集が可能であり、また、セグメント一覧画面からセグメント編集画面に移行することで、セグメントの編集が可能になる。さらに、分析一覧画面からは、帳票分析、ツリー分析、要因分析、予測分析等が可能である。つまり、使用者は、メニューから分析対象データ、リレーション、セグメントの各画面にてデータモデルの設定を行い、分析一覧画面から各種分析を開始することができる。また、ワークフロー画面は入力されたデータを事前に加工するための画面であり、列の削除やデータ間の結合など、データを分析で利用可能な状態にするために必要な前処理をこの画面にて行ってもよい。 As shown in Figure 6, the menu section includes workflow, analysis target data, relations, segments, and analysis. Details will be provided later. Workflows are displayed on the workflow screen, analysis target data on the analysis target data screen, relations on the relation settings screen, segments on the segment list screen, and analysis on the analysis list screen. Data editing is possible on the workflow screen, and segments can be edited by switching from the segment list screen to the segment editing screen. Furthermore, report analysis, tree analysis, factor analysis, predictive analysis, and more are available on the analysis list screen. In other words, users can set data models on the analysis target data, relations, and segments screens from the menus, and then start various analyses from the analysis list screen. The workflow screen is also used to preprocess input data, and preprocessing required to make the data usable for analysis, such as deleting columns and merging data, can be performed on this screen.

図６に示すように、本実施形態では「帳票分析」、「ツリー分析」、「要因分析」、「予測分析」の４つの分析を、共通のデータモデルに対して実施できる。帳票分析とツリー分析は、それぞれ表形式とツリー形式で分析対象データを集計・可視化する分析であり、日々のモニタリングやレポーティングに高頻度に利用される。また、要因分析と予測分析は機械学習を用いた分析で、帳票分析とツリー分析に比べて利用頻度は低いものの、単純な集計では解決できない課題を高度な分析によって解決するために利用される。 As shown in Figure 6, in this embodiment, four types of analysis - "report analysis," "tree analysis," "factor analysis," and "predictive analysis" - can be performed on a common data model. Report analysis and tree analysis are analyses that aggregate and visualize the data to be analyzed in table format and tree format, respectively, and are frequently used for daily monitoring and reporting. Factor analysis and predictive analysis are analyses that use machine learning, and although they are used less frequently than report analysis and tree analysis, they are used to solve problems that cannot be solved by simple aggregation through advanced analysis.

制御ユニット１０の各部は、上述したように各部分に分けて記載しているが、同じ部分が複数種の処理を実行するように構成してもよいし、更に細かく分けて複数の部分を連携させて１つの処理を実行するように構成してもよい。上記各ハードウェアは、図２に示すバスＢなどの電気的な通信路（配線）を介し、必要に応じて双方向通信可能または一方向通信可能に接続されている。 As described above, each part of the control unit 10 is described separately, but the same part may be configured to perform multiple types of processing, or the control unit 10 may be further divided into multiple parts that work together to perform a single process. The above hardware components are connected via electrical communication paths (wiring) such as bus B shown in Figure 2, allowing for bidirectional or unidirectional communication as needed.

主制御部１１は、各種プログラムに基づき数値計算、演算処理、各種情報処理等を行うとともに、ハードウェア各部の制御を行う。主制御部１１は、中央演算処理装置として機能するＣＰＵ１１ａと、主制御部１１が各種プログラムを実行する際のワークエリアとして機能するＲＡＭ等のワークメモリ１１ｂと、起動プログラムや初期化プログラム等が格納されたＲＯＭ、フラッシュＲＯＭまたはＥＥＰＲＯＭ等のプログラムメモリ１１ｃとを備えている。 The main control unit 11 performs numerical calculations, arithmetic processing, various information processing, etc. based on various programs, and controls each hardware component. The main control unit 11 is equipped with a CPU 11a that functions as a central processing unit, a work memory 11b such as RAM that functions as a work area when the main control unit 11 executes various programs, and a program memory 11c such as ROM, flash ROM, or EEPROM that stores startup programs, initialization programs, etc.

図５にも示すように、データ入力部１２ａは、複数の特徴量を有する複数の表形式データ（分析対象データ）を使用者が入力するための部分である。図５に示す例では、分析対象データを２つ入力しているが、これに限らず、３つ以上の分析対象データを入力してもよい。また、１つの分析対象データの入力も可能である。データ入力部１２ａによって図４に示すフローチャートのステップＳ１の処理、即ちデータ入力ステップを実行することが可能になっている。 As also shown in Figure 5, the data input unit 12a is a section where the user inputs multiple pieces of tabular data (analysis target data) having multiple feature quantities. In the example shown in Figure 5, two pieces of analysis target data are input, but this is not limited to this; three or more pieces of analysis target data may be input. It is also possible to input one piece of analysis target data. The data input unit 12a makes it possible to execute the processing of step S1 in the flowchart shown in Figure 4, i.e., the data input step.

データ入力部１２ａは、データ入力用ユーザーインターフェース（図示せず）を生成してモニタ３に表示させる。データ入力部１２ａは、データ入力用ユーザーインターフェース上でなされた使用者の各種操作を受け付ける。使用者の操作とは、たとえば、キーボード４の操作や、マウス５の操作（ボタンクリック、ドラッグ＆ドロップ、ホイールの回転等を含む）、タッチパネル式の入力装置へのタップ操作、ドラッグ操作等があり、これらのいずれの操作であってもよい。以下、同様である。 The data input unit 12a generates a data input user interface (not shown) and displays it on the monitor 3. The data input unit 12a accepts various user operations performed on the data input user interface. User operations include, for example, keyboard 4 operations, mouse 5 operations (including button clicks, drag-and-drop, wheel rotation, etc.), tapping operations on a touch panel input device, dragging operations, etc., and any of these operations may be used. The same applies below.

例えば、分析対象データを格納したファイルが外部記憶装置や記憶部３０（図２に示す）に保存されていて、デスクトップ上や、開いた状態のフォルダにある場合には、使用者が当該ファイルをデータ入力用ユーザーインターフェース上へドラッグ＆ドロップ操作する。これにより、分析対象データを格納したファイルが読み込まれて記憶部３０の所定領域に保存される。また、分析対象データがデータベース上にある場合には、使用者がデータベースに接続し、所望の分析対象データが読み込まれて記憶部３０の所定領域に保存される。また、分析対象データがインターネットやサーバー上にある場合には、使用者がＵＲＬを入力する。分析対象データは、インターネットやサーバーからダウンロードされて記憶部３０の所定領域に保存される。上述した方法は例であり、分析対象データの入力方法はどのような方法であってもよい。以上が図４に示すフローチャートのステップＳ１のデータ入力ステップである。 For example, if a file containing data to be analyzed is saved in an external storage device or the storage unit 30 (shown in Figure 2) and is on the desktop or in an open folder, the user drags and drops the file onto the data input user interface. This loads the file containing the data to be analyzed and saves it in a specified area of the storage unit 30. Alternatively, if the data to be analyzed is stored in a database, the user connects to the database, loads the desired data to be analyzed, and saves it in a specified area of the storage unit 30. Alternatively, if the data to be analyzed is stored on the Internet or a server, the user inputs a URL. The data to be analyzed is downloaded from the Internet or a server and saved in a specified area of the storage unit 30. The above-mentioned method is an example, and any method for inputting data to be analyzed may be used. This completes the data input step of step S1 in the flowchart shown in Figure 4.

図７は、データ入力ステップで入力された分析対象データの例を示しており、ここでは、「会社」、「商談」、「営業活動」、「カレンダー」という４つの分析対象データを入力するものとする。このとき、分析用の型（例えば数値型、カテゴリ型、日付型）を属性ごとに設定する。すなわち、分析対象データは、複数の属性を含むデータであり、属性とは、分析対象データに含まれる名称と型のペアからなる項目のことである。属性には、会社ＩＤ、所在地、活動日などが存在する。型とは、属性がどのような値を取り得るかを定義する分類であり、分類の仕方はシステムによって異なるが、たとえば一般的なリレーショナルデータベースでは、ＩＮＴ型（整数）、ＲＥＡＬ型（実数）、ＤＡＴＥ型（日付）、ＶＡＲＣＨＡＲ型（文字列）などのデータ型のうち、いずれかが属性ごとに割り当てられている。実際のデータベースではこれら以外にも多種多様な型が使用されている。また、分析対象データは、例えばＣＳＶファイルやリレーショナルデータベース上のテーブルであってもよい。 Figure 7 shows an example of data to be analyzed entered in the data input step. Here, four pieces of data to be analyzed are entered: "Company," "Negotiations," "Sales Activities," and "Calendar." At this time, an analysis type (e.g., numeric, category, or date) is set for each attribute. In other words, the data to be analyzed is data containing multiple attributes, and an attribute is an item in the data to be analyzed that consists of a name and type pair. Attributes include company ID, location, and activity date. A type is a classification that defines the values that an attribute can take. While classification methods vary depending on the system, for example, in a typical relational database, each attribute is assigned a data type such as INT (integer), REAL (real number), DATE (date), or VARCHAR (character string). In actual databases, a wide variety of other types are used. Furthermore, the data to be analyzed may be, for example, a CSV file or a table in a relational database.

型情報は、リレーショナルデータベース上の型から類推してもよい。例えばデータベース上でＩＮＴ型の場合は数値型とする、等である。また、型情報は、使用者からの指定を受け付けてもよい。また、必要であれば、文字列の置換などの前処理を使用者やシステム自身によって行ってもよい。 Type information may be inferred from the type in a relational database. For example, an INT type in the database may be treated as a numeric type. Type information may also be specified by the user. If necessary, preprocessing such as string replacement may be performed by the user or the system itself.

入力された分析対象データは、図８に示す分析対象データ画面１００上で確認することができる。分析対象データ画面１００は、表示制御部３ａが生成してモニタ３に表示させる画面である。分析対象データ画面１００には、分析対象データの名称を表示する名称表示領域１０１と、分析対象データ追加ボタン１０２とが設けられている。名称表示領域１０１には、入力された複数の分析対象データの名称を一覧表示可能になっており、この例では、入力された分析対象データの例として、「会社」、「商談」、「営業活動」、「カレンダー」が表示されている。分析対象データ追加ボタン１０２を操作することで、別の分析対象データを新たに入力することが可能になる。分析対象データ画面１００には、データ表示領域１０３も設けられている。名称表示領域１０１に表示されている複数の分析対象データの名称のうち、任意の一を使用者が選択操作すると、その選択された分析対象データの内容がデータ表示領域１０３に表形式で表示される。必要に応じて、分析対象データ画面１００上のデータに対してワークフロー画面でデータ型の変換など、さまざまな加工処理を行ってもよい。また、一旦入力した分析対象データを削除する操作を受け付けるように構成することもできる。 The entered data to be analyzed can be viewed on the analysis target data screen 100 shown in Figure 8. The analysis target data screen 100 is generated by the display control unit 3a and displayed on the monitor 3. The analysis target data screen 100 includes a name display area 101 that displays the names of the data to be analyzed and an Add Analysis Target Data button 102. The name display area 101 can display a list of the names of the entered data to be analyzed. In this example, examples of entered data to be analyzed include "Company," "Negotiations," "Sales Activities," and "Calendar." Operating the Add Analysis Target Data button 102 allows new data to be entered. The analysis target data screen 100 also includes a data display area 103. When the user selects any of the names of the data to be analyzed displayed in the name display area 101, the contents of the selected data to be analyzed are displayed in tabular format in the data display area 103. If necessary, various processing operations, such as data type conversion, may be performed on the data on the analysis target data screen 100 using the workflow screen. It can also be configured to accept an operation to delete data to be analyzed that has already been entered.

図５に示すデータモデル設定部１２ｂは、データ入力部１２ａに入力された複数の分析対象データに含まれる特徴量の関係を定めたリレーション情報の設定を受け付けて、分析対象となるデータモデルを設定する部分である。各分析対象データは表形式であることから、行と列を有している。複数の分析対象データ間の行の対応関係を定義するための情報がリレーション情報であり、このリレーション情報を使用者が追加で設定する。また、必須ではないが、必要に応じて、後述する抽出条件（セグメント）を使用者が定義し、そのセグメントに名称等を付けて記憶部３０に保存しておくこともできる。すなわち、データモデルは、分析の入力に利用される複数の表形式データ、およびそれらの対応関係を定義するリレーションの組み合わせであり、分析で共通に利用するセグメントの定義を追加で含むこともできる。 The data model setting unit 12b shown in Figure 5 is a part that accepts the setting of relation information that defines the relationships between feature quantities contained in multiple pieces of data to be analyzed, input into the data input unit 12a, and sets the data model to be analyzed. Each piece of data to be analyzed is in tabular format, and therefore has rows and columns. Relation information is information that defines the corresponding relationships between rows of multiple pieces of data to be analyzed, and this relation information is additionally set by the user. Also, although not required, if necessary, the user can define extraction conditions (segments), which will be described later, and store these segments in the storage unit 30 with names, etc. In other words, the data model is a combination of multiple pieces of tabular data used as input for analysis and the relations that define the corresponding relationships between them, and can also include definitions of segments commonly used in the analysis.

上記リレーション情報の設定は、図４に示すフローチャートのステップＳ２で実行する。ステップＳ２の処理内容について、図９～図１１に基づいて説明する。図９は、複数の分析対象データ間のリレーション関係を説明する図であり、また、図１０は、多重度の種類を示すものである。ステップＳ２は、データモデル設定ステップに相当する。 The setting of the above relationship information is performed in step S2 of the flowchart shown in Figure 4. The processing content of step S2 will be explained with reference to Figures 9 to 11. Figure 9 explains the relationship between multiple pieces of data to be analyzed, and Figure 10 shows the types of multiplicity. Step S2 corresponds to the data model setting step.

ステップＳ２では、まず、図１１に示すようなリレーション設定画面１１０にて、使用者が分析対象データ間のリレーション（紐づけ）を定義する。リレーション設定画面１１０は、表示制御部３ａが生成してモニタ３に表示させる画面である。リレーション設定画面１１０には、リレーション表示領域１１１が設けられており、このリレーション表示領域１１１において異なる分析対象データ間のリレーションを任意に設定可能になっている。リレーションの設定は、使用者が分析対象データの組に対してそれぞれ属性の名前を指定することで行われる。リレーション表示領域１１１は、複数の領域１１１ａ～１１１ｄを含んでいる。各領域１１１ａ～１１１ｄは同様に構成されており、例えば最も上に位置する領域１１１ａについて説明すると、一の分析対象データの属性の名前を指定する第１指定部１１１ｅと、他の分析対象データの属性の名前を指定する第２指定部１１１ｆとが設けられている。この例では、第１指定部１１１ｅで「会社」の分析対象データの属性の名前から任意の一の名前を指定し、第２指定部１１１ｆで「商談」の分析対象データの属性の名前から任意の一の名前を指定している。他の領域１１１ｂ～１１１ｄでも同様にして指定できる。この指定操作を経ることで、指定した属性の値が一致する行同士が対応しているとみなされる。 In step S2, the user first defines relationships (links) between the data to be analyzed on a relationship setting screen 110 as shown in FIG. 11. The relationship setting screen 110 is a screen generated by the display control unit 3a and displayed on the monitor 3. The relationship setting screen 110 is provided with a relationship display area 111, in which relationships between different data to be analyzed can be arbitrarily set. Relationships are set by the user specifying the name of an attribute for each set of data to be analyzed. The relationship display area 111 includes multiple areas 111a-111d. Each of the areas 111a-111d is similarly configured. For example, the topmost area 111a is provided with a first designation section 111e for designating the name of an attribute of one piece of data to be analyzed, and a second designation section 111f for designating the name of an attribute of another piece of data to be analyzed. In this example, the first designation section 111e designates one arbitrary name from the attribute names of the data to be analyzed for "Company," and the second designation section 111f designates one arbitrary name from the attribute names of the data to be analyzed for "Business Negotiations." Designations can be made in the same way for the other areas 111b-111d. By going through this designation operation, rows with matching values for the designated attributes are considered to correspond to each other.

リレーションに使われる属性のことを、結合キーと呼ぶ。例えば「会社」と「商談」の分析対象データに対して、会社ＩＤを結合キーとしたリレーションを設定した場合、同じ会社ＩＤの行同士が、紐づいているとみなされる。図９における符号Ｌはリレーション関係を示す線である。 An attribute used in a relation is called a join key. For example, if a relation is set with the company ID as the join key for the analysis target data of "Company" and "Negotiation," rows with the same company ID are considered to be linked. The symbol L in Figure 9 is a line that indicates the relationship.

リレーショナルデータベースの場合、データベースの側で既に分析対象データの属性同士の対応関係が定義されていることがある。このデータベース側で定義される対応関係を外部キー制約と呼ぶ。この外部キー制約がある場合、設定済の対応関係を上記ステップＳ１においてデータ分析装置１側で読み込み、分析対象データ間のリレーションの定義をデータ分析装置１側で自動的に設定してもよい。 In the case of a relational database, the correspondence between attributes of the data to be analyzed may already be defined on the database side. This correspondence defined on the database side is called a foreign key constraint. If this foreign key constraint exists, the already set correspondence may be read on the data analysis device 1 side in step S1 above, and the definition of the relationship between the data to be analyzed may be automatically set on the data analysis device 1 side.

以上のようにして分析対象データ間のリレーションが定義されると、データ分析装置１側ではそれぞれの対応関係について多重度を自動判別する。多重度の自動判別は、制御ユニット１０で行われる。図１０に示すように、多重度には１：１型、１：Ｎ型、Ｎ：Ｎ型の３種類があり、分析対象データの内容を参照することで判別することができる。１：１型は、一方の分析対象データの１行が他方の分析対象データの１行に対応している関係である。１：Ｎ型は、一方の分析対象データの１行に他方の分析対象データが複数行対応している関係である。Ｎ：Ｎ型は、一方の分析対象データの１行に他方の分析対象データが複数行対応し、他方の分析対象データの１行に一方の分析対象データが複数行対応している関係である。 Once the relationships between the data to be analyzed are defined as described above, the data analysis device 1 automatically determines the degree of multiplicity for each correspondence. The automatic determination of multiplicity is performed by the control unit 10. As shown in Figure 10, there are three types of multiplicity: 1:1, 1:N, and N:N, which can be determined by referencing the contents of the data to be analyzed. 1:1 is a relationship in which one row of data to be analyzed corresponds to one row of data to be analyzed on the other side. 1:N is a relationship in which one row of data to be analyzed corresponds to multiple rows of data to be analyzed on the other side. N:N is a relationship in which one row of data to be analyzed corresponds to multiple rows of data to be analyzed on the other side, and one row of data to be analyzed corresponds to multiple rows of data to be analyzed on the other side.

図１１に示すリレーション設定画面１１０には、多重度表示領域１１２が設けられている。多重度表示領域１１２には、上述のようにして自動判別された多重度の判別結果が表示される。この例では、「会社」と「商談」の間は１：Ｎの関係であることが分かる。多重度表示領域１１２に表示される判定結果は、リアルタイムで更新される。 The relationship setting screen 110 shown in Figure 11 has a multiplicity display area 112. The multiplicity display area 112 displays the results of the multiplicity determination automatically determined as described above. In this example, it can be seen that there is a 1:N relationship between "Company" and "Business Negotiations." The determination results displayed in the multiplicity display area 112 are updated in real time.

また、データモデル設定部１２ｂは、さらに、複数の分析対象データの中から一部のデータを抽出するためのセグメントの設定を受け付けることが可能に構成されている。セグメントは、分析対象データに対して行の抽出条件を適用することで抽出されるデータの部分集合であり、属性とその条件との組み合わせと呼ぶこともできる。 The data model setting unit 12b is also configured to be able to accept the setting of segments for extracting a portion of data from multiple pieces of data to be analyzed. A segment is a subset of data extracted by applying row extraction conditions to the data to be analyzed, and can also be called a combination of an attribute and its conditions.

分析対象データ、リレーション情報及びセグメントはデータモデルを構成しており、これらは後述する様々な分析を行う際に共通の入力及び設定情報となる。すなわち、データモデル設定部１２ｂは、分析対象データの入力、リレーション情報の設定及びセグメントの設定を受け付けて、分析対象となるデータモデルを設定する。データモデル設定部１２ｂによってデータモデルの設定が完了すると、使用者は単一のデータモデルから、様々な分析を自由に開始することができる。 The data to be analyzed, relationship information, and segments make up a data model, which serves as common input and setting information when performing the various analyses described below. In other words, the data model setting unit 12b accepts the input of the data to be analyzed, the setting of relationship information, and the setting of segments, and sets the data model to be analyzed. Once the data model setting unit 12b has completed setting the data model, the user can freely start various analyses from a single data model.

ここで、データ分析装置１の詳細構造について図５に基づいて説明する。各部の具体的な機能及び動作については、後述するフローチャートやモニタ３に表示される画面例に基づいて説明し、ここでは概略を説明する。 The detailed structure of the data analysis device 1 will now be described with reference to Figure 5. The specific functions and operations of each component will be explained with reference to the flowcharts described below and screen examples displayed on the monitor 3, but only an overview will be provided here.

図５に示すように、データモデル設定部１２ｂにより設定されたデータモデルは、第１調整部（データ調整部）１３及び第２調整部（データ調整部）１４にそれぞれ入力される。第１調整部１３及び第２調整部１４では、使用者により分析設定情報が設定されていれば、その分析設定情報に基づいてデータモデルを調整する。分析設定情報が設定されていなければ、第１調整部１３及び第２調整部１４でデータモデルの調整は行われない。分析設定情報には、目的変数が含まれていてもよく、この目的変数は使用者により指定されたものであってもよい。 As shown in FIG. 5, the data model set by the data model setting unit 12b is input to the first adjustment unit (data adjustment unit) 13 and the second adjustment unit (data adjustment unit) 14, respectively. If analysis setting information has been set by the user, the first adjustment unit 13 and the second adjustment unit 14 adjust the data model based on the analysis setting information. If analysis setting information has not been set, the first adjustment unit 13 and the second adjustment unit 14 do not adjust the data model. The analysis setting information may include a target variable, which may be specified by the user.

第１分析部１５は、データモデル設定部１２ｂにより設定されたデータモデルに対して、第１の分析を実行し、第１の分析結果を生成する部分であり、第１変換・結合処理部１５ａと第１処理エンジン１５ｂとを有している。第１変換・結合処理部１５ａは、第１分析部１５に入力されたデータモデルに基づいて必要な変換・結合処理を内部で自動的に行う部分である。この変換・結合処理には、特許文献１に開示されているような特徴量の自動生成処理が含まれていてもよい。第１変換・結合処理部１５ａで変換・結合処理が行われたデータモデルは、第１処理エンジン１５ｂに入力される。第１処理エンジン１５ｂで行われる分析処理には、機械学習を用いた処理、ＳＱＬなどを用いた集計処理の少なくとも一方または両方が含まれる。尚、第１分析部１５は、第１調整部１３でデータモデルの調整が行われていなければ、調整されていないデータモデルに対して分析を実行するが、第１調整部１３でデータモデルの調整が行われていれば、調整されたデータモデルに対して分析を実行する場合がある。 The first analysis unit 15 is a unit that performs a first analysis on the data model set by the data model setting unit 12b and generates a first analysis result. It includes a first conversion and merging processing unit 15a and a first processing engine 15b. The first conversion and merging processing unit 15a is a unit that automatically performs the necessary conversion and merging processing internally based on the data model input to the first analysis unit 15. This conversion and merging processing may include automatic feature generation processing such as that disclosed in Patent Document 1. The data model that has undergone conversion and merging processing by the first conversion and merging processing unit 15a is input to the first processing engine 15b. The analysis processing performed by the first processing engine 15b includes at least one or both of processing using machine learning and aggregation processing using SQL or the like. Note that if the data model has not been adjusted by the first adjustment unit 13, the first analysis unit 15 performs analysis on the unadjusted data model. However, if the data model has been adjusted by the first adjustment unit 13, the first analysis unit 15 may perform analysis on the adjusted data model.

使用者により指定された目的変数が分析設定情報に含まれている場合、第１分析部は要因分析を実行する。第１分析部は、指定された目的変数と関連度が大きい特徴量を抽出するとともに、全データの目的変数の平均値と比較し、目的変数の平均値が相対的に高くなる又は低くなるセグメントを抽出するための要因分析を実行することで、より深いデータ分析が可能になる。 If the objective variable specified by the user is included in the analysis setting information, the first analysis unit performs a factor analysis. The first analysis unit extracts features that are highly correlated with the specified objective variable, compares them with the average value of the objective variable for all data, and performs a factor analysis to extract segments where the average value of the objective variable is relatively high or low, enabling more in-depth data analysis.

第１分析部１５は、使用者により指定された目的変数と関連度が大きい特徴量として、元のデータモデルには存在しない新しい特徴量を自動的に生成することもできる。 The first analysis unit 15 can also automatically generate new features that do not exist in the original data model as features that are highly related to the target variable specified by the user.

第２分析部１６は、第２調整部１４により調整されたデータモデルに対して第２の分析を実行し、第２の分析結果を生成する部分であり、第２変換・結合処理部１６ａと第２処理エンジン１６ｂとを有している。第２変換・結合処理部１６ａは、第２分析部１６に入力されたデータモデルに基づいて第１変換・結合処理部１５ａと同様に、変換・結合処理を内部で自動的に行う部分である。第２変換・結合処理部１６ａで変換・結合処理が行われたデータモデルは、第２処理エンジン１６ｂに入力される。第２処理エンジン１６ｂは、第１処理エンジン１５ｂと同様に構成されている。尚、第２分析部１６は、第２調整部１４でデータモデルの調整が行われていれば、調整されたデータモデルに対して分析を実行するが、第２調整部１４でデータモデルの調整が行われていなければ、調整されていないデータモデルに対して分析を実行する場合がある。 The second analysis unit 16 is a part that performs a second analysis on the data model adjusted by the second adjustment unit 14 and generates second analysis results, and has a second conversion and merging processing unit 16a and a second processing engine 16b. The second conversion and merging processing unit 16a is a part that automatically performs conversion and merging processing internally, similar to the first conversion and merging processing unit 15a, based on the data model input to the second analysis unit 16. The data model that has undergone conversion and merging processing by the second conversion and merging processing unit 16a is input to the second processing engine 16b. The second processing engine 16b is configured similarly to the first processing engine 15b. Note that if the data model has been adjusted by the second adjustment unit 14, the second analysis unit 16 performs analysis on the adjusted data model. However, if the data model has not been adjusted by the second adjustment unit 14, the second analysis unit 16 may perform analysis on the unadjusted data model.

また、第１分析部１５は、予測対象のデータごとに、目的変数の値を予測する予測分析を実行することもできる。この場合、第１分析部１５は、分析設定情報として、使用者による予測基準日の設定を受け付けることができる。第１分析部１５は、予測基準日を受け付けると、予測対象のデータモデルの中に集計期間をパラメータに持つ特徴量が含まれている場合は、受け付けた予測基準日に基づいて、集計期間をパラメータに持つ各特徴量の値を自動的に再計算する処理を実行する。 The first analysis unit 15 can also perform predictive analysis to predict the value of the objective variable for each piece of data to be predicted. In this case, the first analysis unit 15 can accept a prediction base date set by the user as analysis setting information. Upon accepting the prediction base date, if the data model to be predicted includes features that have the aggregation period as a parameter, the first analysis unit 15 executes a process to automatically recalculate the values of each feature that has the aggregation period as a parameter, based on the accepted prediction base date.

分析に機械学習を用いる場合には、第１分析部１５は、分析設定情報として使用者により設定された学習基準日を受け付ける。第１分析部１５は、受け付けた学習基準日よりも前の期間に集計されたデータに基づいて特徴量を集計し、学習基準日よりも後の期間に集計されたデータに基づいて目的変数を集計することにより、要因分析を実行することができる。つまり、特徴量を集計するためのデータの集計期間と、目的変数を集計するためのデータの集計期間とを使用者によって任意に変えることができる。 When machine learning is used for the analysis, the first analysis unit 15 accepts a learning reference date set by the user as analysis setting information. The first analysis unit 15 can perform factor analysis by aggregating feature amounts based on data aggregated for a period before the accepted learning reference date, and aggregating objective variables based on data aggregated for a period after the learning reference date. In other words, the user can freely change the data aggregation period for aggregating feature amounts and the data aggregation period for aggregating objective variables.

第１分析部１５が要因分析を実行した場合、第２分析部１６は、第１分析部１５が実行した要因分析により、目的変数との関連度が高い特徴量として抽出された特徴量が付加されたデータモデルに基づいて、予測分析を実行することが可能である。このとき、第２分析部１６は、分析設定情報として、学習基準日とは異なる予測基準日の設定を受け付けることができる。予測対象のデータモデルの中に、集計期間をパラメータに持つ特徴量が含まれている場合があり、この場合、予測分析に用いられる特徴量を集計するためのデータの集計期間は、要因分析に用いられる特徴量を集計するためのデータの集計期間と異なる。そのため、予測分析を行う場合は、要因分析により抽出された特徴量をそのまま用いるのではなく、予測分析に適した特徴量を得るために、第２分析部１６は、予測基準日に基づいて各特徴量の値を自動的に再計算する。 When the first analysis unit 15 performs factor analysis, the second analysis unit 16 can perform predictive analysis based on a data model to which feature quantities extracted by the factor analysis performed by the first analysis unit 15 as feature quantities highly associated with the target variable have been added. In this case, the second analysis unit 16 can accept, as analysis setting information, the setting of a prediction base date that differs from the learning base date. The data model to be predicted may include feature quantities that have an aggregation period as a parameter. In this case, the aggregation period for the data used to aggregate the feature quantities used in the predictive analysis differs from the aggregation period for the data used in the factor analysis. Therefore, when performing predictive analysis, the feature quantities extracted by the factor analysis are not used as is, but rather the second analysis unit 16 automatically recalculates the value of each feature quantity based on the prediction base date to obtain feature quantities suitable for the predictive analysis.

第２分析部１６は、ルールベース方式にしたがって予測分析によって予測された予測対象のデータごとの目的変数のスコアリングを行うこと、及び機械学習方式にしたがって予測分析によって予測された予測対象のデータごとの目的変数のスコアリングを行うことが可能である。この場合、第２分析部１６は、使用者はルールベース方式と機械学習方式のいずれか一方の選択操作を受け付ける。第２分析部１６は、ルールベース方式と機械学習方式のいずれかの方式から、使用者により選択された方式にしたがって、予測分析のスコアリングを行う。 The second analysis unit 16 is capable of scoring the objective variable for each data item of the prediction target predicted by predictive analysis according to the rule-based method, and scoring the objective variable for each data item of the prediction target predicted by predictive analysis according to the machine learning method. In this case, the second analysis unit 16 accepts a user selection operation between the rule-based method and the machine learning method. The second analysis unit 16 scores the predictive analysis according to the method selected by the user from either the rule-based method or the machine learning method.

予測分析のスコアリングを行った場合、第２分析部１６は、データを、スコアが高いデータから順に並べてモニタ３に表示させることができる。第２分析部１６は、使用者からある施策を適用すべきデータ範囲の入力を受け付けることができるとともに、その施策１件あたりにかかるコストの入力と、目的達成１件あたりに得られる利益の入力とを受け付けることができる。第２分析部１６は、前記データ範囲に含まれるデータの数と、施策１件あたりにかかるコストと、目的達成１件あたりに得られる利益とに基づいて、前記データ範囲に施策を適用した場合にかかる総コストと、得られる総利益を計算することができる。 When predictive analysis scoring has been performed, the second analysis unit 16 can display the data on the monitor 3 in descending order of score. The second analysis unit 16 can accept input from the user of the data range to which a certain measure should be applied, as well as input of the cost per measure and the profit obtained per achievement of the objective. The second analysis unit 16 can calculate the total cost and total profit obtained when a measure is applied to the data range, based on the number of data included in the data range, the cost per measure, and the profit obtained per achievement of the objective.

第２分析部１６は、施策の総コストと、施策によって得られる総利益を計算するとともに、施策を実行すべきデータ件数を自動的に計算することで、施策を実行した場合に得られる投資対効果を数値で使用者に提示することができる。 The second analysis unit 16 calculates the total cost of the measure and the total profit obtained from the measure, and automatically calculates the number of data items for which the measure should be implemented, thereby presenting the user with numerical values of the return on investment obtained when the measure is implemented.

図２に示すように、データ分析装置１は、第３分析部１９及び第４分析部２０を備えていてもよい。第３分析部１９は、データモデルに基づいて帳票分析を実行する部分であり、モニタ３上に、マトリクスに帳票分析結果を表示させる。第３分析部１９は、マトリクス上で、基準データと、比較データの選択を使用者から受け付け、受け付けた２つのデータの差異に関連した情報をモニタ３上にさらに表示させる。 As shown in FIG. 2, the data analysis device 1 may include a third analysis unit 19 and a fourth analysis unit 20. The third analysis unit 19 is a unit that performs form analysis based on a data model and displays the form analysis results in a matrix on the monitor 3. The third analysis unit 19 accepts selections of reference data and comparison data from the user in the matrix, and further displays information related to the differences between the two accepted data on the monitor 3.

第４分析部２０は、前記２つのデータの差異に関連した情報をツリー状に表示するツリー分析を実行する部分であり、前記２つのデータの差異を特定の特徴量に注目してモニタ３に表示させる。データ分析装置１は、第３分析部１９及び第４分析部２０を備えている場合、第４分析部２０によるツリー分析から、第３分析部１９による帳票分析を派生させてモニタ３に表示可能に構成されている。 The fourth analysis unit 20 is a part that performs tree analysis to display information related to the differences between the two pieces of data in a tree-like format, and displays the differences between the two pieces of data on the monitor 3, focusing on specific feature quantities. When the data analysis device 1 is equipped with the third analysis unit 19 and the fourth analysis unit 20, it is configured to be able to derive a report analysis by the third analysis unit 19 from the tree analysis by the fourth analysis unit 20 and display it on the monitor 3.

出力部１８は、第１分析部１５及び第２分析部１６の少なくともいずれか一方の分析結果に含まれる新たな特徴量を、次回の分析対象となるデータモデルに付加する部分である。第１分析部１５が第１の分析を実行するとその分析結果が取得されるが、この分析結果には、別の分析に役立つ特徴量が含まれている場合がある。第２分析部１６で取得された分析結果も同様である。このような特徴量が含まれている場合には、その特徴量を次回の分析対象となるデータモデルに付加することで、次回の分析では新たな特徴量を用いた分析が可能になる。 The output unit 18 is a part that adds new features contained in the analysis results of at least one of the first analysis unit 15 and the second analysis unit 16 to the data model to be analyzed next. When the first analysis unit 15 performs a first analysis, the analysis results are obtained, and these analysis results may contain features that are useful for another analysis. The same is true for the analysis results obtained by the second analysis unit 16. If such features are included, adding these features to the data model to be analyzed next makes it possible to use the new features in the next analysis.

また、出力部１８は、第１分析部１５及び第２分析部１６の少なくともいずれか一方の分析結果に含まれる新たなセグメントを、次回の分析対象となるデータモデルに付加することもできる。セグメントは、分析対象データに対して行の抽出条件を適用することで抽出されるデータの部分集合であり、属性とその条件との組み合わせと呼ぶこともできる。このセグメントが第１分析部１５から取得された分析結果や第２分析部１６から取得された分析結果に含まれている場合がある。このようなセグメントが含まれている場合には、そのセグメントを次回の分析対象となるデータモデルに付加することで、次回の分析では新たなセグメントを用いた分析が可能になる。要するに、分析の結果として特徴量やセグメントが得られた場合、それらをデータモデルに付加することで、ある分析で取得された結果を、別の分析の入力として簡単に用いることができる。 The output unit 18 can also add new segments included in the analysis results of at least one of the first analysis unit 15 and the second analysis unit 16 to the data model to be analyzed next. A segment is a subset of data extracted by applying row extraction conditions to the data to be analyzed, and can also be called a combination of attributes and their conditions. This segment may be included in the analysis results obtained from the first analysis unit 15 or the analysis results obtained from the second analysis unit 16. If such a segment is included, adding that segment to the data model to be analyzed next makes it possible to perform analysis using the new segment in the next analysis. In other words, if features or segments are obtained as a result of an analysis, adding them to the data model makes it easy to use the results obtained from one analysis as input for another analysis.

第１分析部１５が要因分析を実行した場合、出力部１８は、第１分析部１５が実行した要因分析の結果に基づいて、使用者により指定された目的変数との関連度が大きい特徴量を次回の分析対象となるデータモデルに付加する。また、第１分析部１５が要因分析を実行した結果、セグメントが抽出された場合、出力部１８は、実行した要因分析に基づいて抽出されたセグメントを、次回の分析対象となるデータモデルに付加する。 When the first analysis unit 15 performs a factor analysis, the output unit 18 adds features that are highly associated with the target variable specified by the user to the data model to be analyzed next, based on the results of the factor analysis performed by the first analysis unit 15. Furthermore, when segments are extracted as a result of the factor analysis performed by the first analysis unit 15, the output unit 18 adds the segments extracted based on the performed factor analysis to the data model to be analyzed next.

第２分析部１６が要因分析を実行してスコアリングした場合、出力部１８は、予測分析によって特定されたスコアが高い一部のデータをセグメントとして出力し、次回の分析対象のデータモデルに付加することが可能になっている。 When the second analysis unit 16 performs factor analysis and scoring, the output unit 18 can output some of the data with high scores identified by the predictive analysis as segments and add them to the data model to be analyzed next time.

図４に示すフローチャートのステップＳ２でリレーションの定義が完了すると、ステップＳ３に進む。ステップＳ３では、帳票分析及びツリー分析を実行する。ステップＳ３の説明を行う前に、以下、データモデルの設定完了後、データ分析装置１による分析の例について説明する。データ分析装置１は、データモデルの設定が完了すると図１２に示す分析一覧画面１２０を表示制御部３ａが生成してモニタ３に表示させる。分析一覧画面１２０には、分析種別選択部１２１が設けられている。分析種別選択部１２１をクリックすると、「帳票分析」、「ツリー分析」、「要因分析」、「予測分析」等の分析種別が表示され、それらの中から使用者が所望の分析種別を選択できる。この選択操作は、どのような操作であってもよく、キーボード４やマウス５等を用いて行うことができる。作成ボタン１２２を押すと、選択された分析が実行される。図１２に示す例では、「帳票分析」が選択された場合を示している。要因分析は、第１分析部１５で実行される第１の分析であり、また、予測分析は、第２分析部１６で実行される第２の分析であり、また、帳票分析は、第３分析部１９で実行される第３の分析であり、さらに、ツリー分析は、第４分析部２０で実行される第４の分析である。 Once the definition of the relationship is completed in step S2 of the flowchart shown in FIG. 4, the process proceeds to step S3. In step S3, a report analysis and a tree analysis are performed. Before explaining step S3, an example of an analysis performed by the data analysis device 1 after the data model configuration is complete will be described below. When the data model configuration is complete, the data analysis device 1 causes the display control unit 3a to generate the analysis list screen 120 shown in FIG. 12 and display it on the monitor 3. The analysis list screen 120 includes an analysis type selection section 121. Clicking the analysis type selection section 121 displays analysis types such as "report analysis," "tree analysis," "factor analysis," and "predictive analysis," allowing the user to select the desired analysis type from among them. This selection operation can be performed by any operation and can be performed using the keyboard 4, mouse 5, etc. Pressing the Create button 122 executes the selected analysis. The example shown in FIG. 12 shows the case where "report analysis" is selected. Factor analysis is the first analysis performed by the first analysis unit 15, predictive analysis is the second analysis performed by the second analysis unit 16, report analysis is the third analysis performed by the third analysis unit 19, and tree analysis is the fourth analysis performed by the fourth analysis unit 20.

データ分析装置１の主制御部１１は「帳票分析」が選択されたことを検出すると、帳票分析の設定を受け付ける。まず、図１３に示す帳票分析の設定画面１３０を表示制御部３ａが生成してモニタ３に表示させる。帳票分析の設定画面１３０には、分析対象データの属性を表示する属性表示領域１３１が設けられている。属性表示領域１３１には、既に入力されている全ての分析対象データの属性が、分析対象データごとにまとめて表示される。この場合「会社」「商談」「営業活動」「カレンダー」の属性が表示されている。 When the main control unit 11 of the data analysis device 1 detects that "Report Analysis" has been selected, it accepts the settings for the report analysis. First, the display control unit 3a generates the report analysis settings screen 130 shown in FIG. 13 and displays it on the monitor 3. The report analysis settings screen 130 has an attribute display area 131 that displays the attributes of the data to be analyzed. The attribute display area 131 displays the attributes of all data to be analyzed that have already been entered, grouped by data to be analyzed. In this case, the attributes of "Company," "Business Negotiations," "Sales Activities," and "Calendar" are displayed.

帳票分析の設定画面１３０には、帳票分析用の行及び列を定義するための列エリア１３２及び行エリア１３３が設けられている。列エリア１３２や行エリア１３３には、それぞれ属性表示領域１３１に表示されている属性を入力できる。例えば所望の属性を選択して列エリア１３２や行エリア１３３にドラッグ＆ドロップ操作によって配置してもよく、その入力操作はどのような操作であってもよい。つまり、分析対象データの属性を列エリア１３２及び行エリア１３３に配置していく操作を使用者が行うことで、帳票の行及び列を容易に定義することができる。 The form analysis settings screen 130 has a column area 132 and a row area 133 for defining rows and columns for form analysis. Attributes displayed in the attribute display area 131 can be entered into the column area 132 and row area 133, respectively. For example, desired attributes can be selected and placed in the column area 132 or row area 133 using a drag-and-drop operation; any input operation is acceptable. In other words, the user can easily define the rows and columns of the form by placing the attributes of the data to be analyzed in the column area 132 and row area 133.

帳票分析の設定画面１３０には、フィルターエリア１３４が設けられている。フィルターエリア１３４には、属性やデータモデルで定義されたセグメントを絞り込み条件として入力することができる。フィルターエリア１３４に絞り込み条件を入力することで、帳票の計算対象とするデータを自由に絞り込むことができる。ここにも所望の属性をドラッグ＆ドロップ操作によって配置できる。 The report analysis settings screen 130 has a filter area 134. In the filter area 134, attributes and segments defined in the data model can be entered as filtering conditions. By entering filtering conditions in the filter area 134, you can freely narrow down the data to be calculated in the report. Desired attributes can also be placed here by dragging and dropping.

帳票分析の設定画面１３０には、値エリア１３５が設けられている。値エリア１３５では、帳票の内容として表示する数値の定義が可能になっている。値エリア１３５に例えば数値型の属性を配置すると、配置した属性の合計値が自動的に計算され、表エリア１３６に表示されている表の各部に表示される。ここにも所望の属性をドラッグ＆ドロップ操作によって配置できる。 The report analysis settings screen 130 has a value area 135. In the value area 135, you can define the numerical values to be displayed as the report contents. For example, if you place a numeric attribute in the value area 135, the total value of the placed attribute is automatically calculated and displayed in each part of the table displayed in the table area 136. You can also place the desired attribute here by dragging and dropping.

この例では、表エリア１３６に表示されている表において、成約の合計値が値に設定されており、２０１８年、第４四半期、商談動機＝Ｗｅｂに該当する箇所の値は破線で囲んで示しているように「８」となっている。この数値を計算する際には、図１４に示すように、まず、「商談」の分析対象データに対して、「年度＝２０１８かつ四半期＝第４四半期かつ商談動機＝Ｗｅｂ」に該当する行を抽出する。「年度」と「四半期」については、ステップＳ２で設定されたリレーション情報に基づいて、「カレンダー」の分析対象データの該当行に紐づいている「商談」の行だけを抽出する。抽出された「商談」の行に対して、属性：成約の合計値を計算する。 In this example, the total value of sales is set as a value in the table displayed in table area 136, and the value for the part corresponding to 2018, 4th quarter, and sales motivation = Web is "8," as shown by the dashed line. When calculating this value, as shown in FIG. 14, first, rows corresponding to "year = 2018, quarter = 4th quarter, and sales motivation = Web" are extracted from the "Sales Negotiation" data to be analyzed. For "year" and "quarter," only the "Sales Negotiation" rows linked to the corresponding rows of the "Calendar" data to be analyzed are extracted based on the relationship information set in step S2. The total value of the attribute: sales is calculated for the extracted "Sales Negotiation" rows.

集計方法は合計に限られるものではなく、合計以外に平均、最小、最大等が選択可能であってもよい。数式を使用者が入力することで、より複雑な値を定義できるようにしてもよい。このように、予め設定されているリレーション情報を用いることで、複数の分析対象データを事前に集計、結合することなく、属性を自由に組み合わせた帳票を作成することが簡単にできる。 The aggregation method is not limited to summation; other options such as average, minimum, maximum, etc. may also be selectable. Users may be able to define more complex values by entering formulas. In this way, by using pre-set relation information, it is easy to create reports that freely combine attributes without having to aggregate or combine multiple pieces of data to be analyzed in advance.

図１５は、帳票として設定した値を示す表であり、図１６は、帳票として設定した値に基づいて自動的に作成された帳票の出力画面１４０の例を示している。帳票の出力画面１４０には、帳票が表示される帳票表示領域１４１が設けられている。帳票表示領域１４１には、帳票分析結果がマトリクスに表示されており、この表示は、第３分析部１９が実行する。このように、様々な集計方法を組み合わせることで、ビジネス上の指標を高度なプログラミングを必要とせずに簡単に計算することができる。 Figure 15 is a table showing the values set as a report, and Figure 16 shows an example of an output screen 140 for a report that was automatically created based on the values set as a report. The output screen 140 for the report is provided with a report display area 141 in which the report is displayed. The report display area 141 displays the report analysis results in a matrix, and this display is performed by the third analysis unit 19. In this way, by combining various aggregation methods, business indicators can be easily calculated without the need for advanced programming.

さらに、計算に利用している分析対象データを定期的に最新データに置換することもできる。例えば、最新データが入力されると、古い分析対象データを最新データに自動的に置換することで、定期的に行う集計作業が自動的に実行されることになる。この最新データへの置換は、使用者が手動で行ってもよく、接続先のデータベースから定期的に自動取得するような設定が可能であってもよい。 Furthermore, the analysis target data used in calculations can be periodically replaced with the latest data. For example, when the latest data is input, the old analysis target data is automatically replaced with the latest data, thereby automatically executing the periodic aggregation work. This replacement with the latest data can be performed manually by the user, or it can be configured to be automatically retrieved periodically from the connected database.

以上の例は、図１２に示す分析一覧画面１２０で帳票分析が選択された場合の例であるが、次は、分析一覧画面１２０でツリー分析が選択された場合の例について説明する。第４分析部２０は、帳票分析の結果が表示されるマトリクス上で、例えば基準データと、比較データの選択を使用者から受け付け、受け付けた２つのデータの差異に関連した情報を表示する。その一例として、基準データと比較データとの差異に関連した情報をツリー状に表示するツリー分析を第４分析部２０が実行する。第４分析部２０は、図１７に示すようなツリー分析画面１５０を生成してモニタ３に表示させる。この例では、基準データと比較データとの差異を特定の特徴量に注目して表示させることができる。 The above example is for when form analysis is selected on the analysis list screen 120 shown in FIG. 12. Next, we will explain an example for when tree analysis is selected on the analysis list screen 120. The fourth analysis unit 20 accepts a selection of, for example, reference data and comparison data from the user on a matrix displaying the results of the form analysis, and displays information related to the differences between the two accepted data. As one example, the fourth analysis unit 20 performs tree analysis, which displays information related to the differences between the reference data and comparison data in a tree format. The fourth analysis unit 20 generates a tree analysis screen 150 such as that shown in FIG. 17 and displays it on the monitor 3. In this example, the differences between the reference data and comparison data can be displayed by focusing on specific feature quantities.

ツリー分析画面１５０に示す例は、同一の分析対象データに対してツリー分析を行っている例である。このツリー分析では、２つの分析グループ（データのサブセット）を指定することで、両グループ間で値の差を掘り下げて分析することができる。 The example shown on the tree analysis screen 150 is an example of tree analysis being performed on the same data to be analyzed. In this tree analysis, by specifying two analysis groups (subsets of data), you can perform an in-depth analysis of the differences in values between the two groups.

ツリー分析画面１５０には、第１の分析グループを指定するための第１指定領域１５１と、第２の分析グループを指定するための第２指定領域１５２とが設けられている。図１７に示す例では、第１の分析グループは「年度＝２０１９かつ四半期＝第４四半期」の条件に合致するデータのサブセットが指定され、また第２の分析グループは「年度＝２０１８かつ四半期＝第４四半期」の条件に合致するデータのサブセットが指定されている。また、ここでの値は、集計分析における値と同様、カラムと集計方法を指定したり、使用者が数式を入力することで定義できる。 The tree analysis screen 150 has a first specification area 151 for specifying a first analysis group and a second specification area 152 for specifying a second analysis group. In the example shown in FIG. 17, the first analysis group specifies a subset of data that meets the conditions "year = 2019 and quarter = 4th quarter," and the second analysis group specifies a subset of data that meets the conditions "year = 2018 and quarter = 4th quarter." Furthermore, like values in aggregation analysis, the values here can be defined by specifying a column and aggregation method, or by the user entering a formula.

ツリー分析画面１５０には、第１指定領域１５１及び第２指定領域１５２の下方にツリー表示領域１５３が設けられている。ツリー表示領域１５３には、分析内容がツリー形式に表示されており、ここに表示さる分析軸の追加ウインドウ１５４の項目名を例えばマウス５でクリックすることで、分析軸を次々に追加していくことができる。分析軸を追加していくことで、２つのグループの間でどの場所で大きな差が発生しているのか、詳細に掘り下げて分析することができる。 The tree analysis screen 150 has a tree display area 153 below the first specification area 151 and the second specification area 152. The tree display area 153 displays the analysis content in a tree format, and analysis axes can be added one after another by clicking, for example, with the mouse 5, on the item names in the analysis axis addition window 154 displayed here. By adding analysis axes, it is possible to perform a detailed analysis to determine where significant differences occur between the two groups.

すなわち、例えば「２０１９年第４四半期と２０１８年第４四半期を比較すると、２０１９年のほうが成約数が６件多かった」という集計結果が帳票から得られたときに、どのような種別の商談で成約に差が付いたのか、どの地区で差が付いたのか、四半期の中でどの月度に差があったのか、といった要素への掘り下げを行いたくなる場合は多いと考えられるが、帳票の場合、複数の軸の組み合わせで掘り下げを行うことは、表の行・列数が組み合わせによって莫大になってしまうので難しい。 For example, when a report gives you the aggregated result, "Comparing the fourth quarter of 2019 and the fourth quarter of 2018, there were six more deals closed in 2019," you'll likely want to drill down to find out what types of deals caused the difference in closed deals, in which regions, and in which months within the quarter there was a difference. However, with reports, drilling down using multiple axes is difficult because the number of rows and columns in the table can become enormous depending on the combination.

それに対して、ツリー形式の分析では、表示されているノードを選択して分析軸を追加していくことで、より直感的な操作で任意の要素の組み合わせによる掘り下げを行うことができる。例えば、図１７に示す例では、商談動機が「紹介」という条件の時に、２つのグループの間で成約率に６ポイントの差が生じていることが分かる。さらに、図１７に示す例では、商談動機が「Ｗｅｂ」という条件では、２つのグループの間で成約率の差は生じていないが、商談動機が「Ｗｅｂ」という条件を掘り下げた場合、月度が「１月度」という条件で、成約率に１ポイントの差が生じていることが分かる。 In contrast, tree-style analysis allows you to select displayed nodes and add analytical axes, allowing you to drill down more intuitively using any combination of elements. For example, in the example shown in Figure 17, it can be seen that when the negotiation motivation is "referral," there is a 6-point difference in the closing rate between the two groups. Furthermore, in the example shown in Figure 17, when the negotiation motivation is "web," there is no difference in the closing rate between the two groups, but when the negotiation motivation is "web" and the month is "January," it can be seen that there is a 1-point difference in the closing rate.

一方、ツリー形式では掘り下げを行う要素以外の要素、例えば図１７に示す例では、２０１９年第４四半期と２０１８年第４四半期以外の期間での推移のような情報を見ることはできず、帳票分析と比較してツリー分析では情報の網羅性に欠ける部分がある。そのため、分析の目的に応じて帳票分析とツリー分析とを組み合わせることが有効である。 On the other hand, with the tree format, it is not possible to view elements other than the element being drilled down into, such as trends in periods other than the fourth quarter of 2019 and the fourth quarter of 2018 in the example shown in Figure 17, and tree analysis lacks the comprehensiveness of information compared to form analysis. For this reason, it is effective to combine form analysis and tree analysis depending on the purpose of the analysis.

このように、網羅的に数値を確認できる帳票分析と、任意の掘り下げが可能なツリー分析とは互いに補完関係にあるため、分析を個別に実行できるだけでなく、帳票分析とツリー分析を相互に行き来できるようにすることがデータ分析の利便性を向上する上で有効である。本実施形態では、帳票分析からツリー分析を開始する機能を搭載している。 In this way, report analysis, which allows for comprehensive numerical verification, and tree analysis, which allows for arbitrary digging, are complementary to each other. Therefore, not only can the analyses be performed separately, but being able to switch back and forth between report analysis and tree analysis is effective in improving the convenience of data analysis. This embodiment is equipped with a function that allows tree analysis to be started from report analysis.

表示制御部３ａが図１６に示す帳票の出力画面１４０を表示させている状態で、使用者が帳票表示領域１４１に表示されている帳票の２箇所を選択すると、図１８に示すように、帳票表示領域１４１の隣に情報表示領域１４２が生成される。図１８では、使用者により選択された箇所をそれぞれ破線で囲んで示している。 When the display control unit 3a is displaying the report output screen 140 shown in Figure 16 and the user selects two locations on the report displayed in the report display area 141, an information display area 142 is generated next to the report display area 141, as shown in Figure 18. In Figure 18, the locations selected by the user are each surrounded by a dashed line.

情報表示領域１４２には、選択した２箇所の差異に関連した情報が１つまたは複数表示される。情報表示領域１４２に表示される情報が複数存在している場合には、複数の情報が優先度順に表示される。ここでの順は、差異が大きい順であってもよいし、何らかの統計分析を行った結果得られた指標に基づいて決定した順であってもよい。 Information display area 142 displays one or more pieces of information related to the differences between the two selected locations. If there is more than one piece of information to display in information display area 142, the pieces of information are displayed in order of priority. The order here may be in order of the largest difference, or may be determined based on an index obtained as a result of some kind of statistical analysis.

使用者は、情報表示領域１４２に表示された複数の情報の中から、詳細に分析したい項目を選択することができる。使用者による選択操作は、例えば項目をマウス５でクリックする操作等を挙げることができる。項目を選択すると、分析開始ボタン１４３が表示される。分析開始ボタン１４３を使用者が操作すると、主制御部１１は、使用者が選択した項目に基づいて、その内容をツリー分析の設定項目へと自動的に変換し、ツリー分析を開始する。例えば、図１９に示すような変換規則に基づいて、帳票での設定情報と選択状態から、ツリー分析の設定を生成することができる。この変換規則はあくまでも一例であり、他の変換規則を用いてもよい。図１９に示す変換規則に基づいて、図１８に示す帳票分析からツリー分析を開始する。開始されたツリー分析の状態を図２０に示す。 The user can select an item they wish to analyze in detail from the multiple pieces of information displayed in the information display area 142. The user can select an item by, for example, clicking the item with the mouse 5. Once an item is selected, the analysis start button 143 is displayed. When the user operates the analysis start button 143, the main control unit 11 automatically converts the contents of the item selected by the user into tree analysis setting items and starts the tree analysis. For example, based on the conversion rules shown in Figure 19, tree analysis settings can be generated from the setting information and selection state in the form. This conversion rule is merely an example, and other conversion rules may also be used. Based on the conversion rules shown in Figure 19, tree analysis is started from the form analysis shown in Figure 18. The state of the started tree analysis is shown in Figure 20.

次に、ツリー分析から帳票分析を開始する場合について説明する。図１７や図２０に示すツリー分析画面１５０を用いてツリー分析で掘り下げを行っていると、２つの分析グループ以外での値について確認したくなることがある。例えば図２１に示すように、ツリー分析画面１５０上では、２０１９年と２０１８年の第４四半期間で成約率を比較した結果、「商談動機＝紹介」の条件で成約率が３．５７％から２３．３３％に大きく変化していることが分かる。しかしながら、このツリー分析では２グループ間の数値のみ比較しているため、上記差異が一過性のものなのか、継続的なトレンドを持っているのかを確認することができない。 Next, we will explain the case where report analysis is started from tree analysis. When digging deeper into tree analysis using the tree analysis screen 150 shown in Figures 17 and 20, you may want to check values outside of the two analysis groups. For example, as shown in Figure 21, the tree analysis screen 150 compares the closing rates between the fourth quarter of 2019 and 2018, and shows that the closing rate changed significantly from 3.57% to 23.33% under the condition "negotiation motivation = referral." However, because this tree analysis only compares the numerical values between the two groups, it is not possible to determine whether the difference is temporary or represents a continuing trend.

図２２に示すように、「商談動機＝紹介」の欄、即ち紹介欄１５３ａを使用者が選択すると、ツリー表示領域１５３の隣に情報表示領域１５５が生成される。図２２では、使用者により選択された箇所を破線で囲んで示している。情報表示領域１５５には、使用者が選択した欄で生じている差異の具体的な数値等が表示される。また、紹介欄１５３ａを使用者が選択すると、推移を確認するための確認ボタン１５５ａが情報表示領域１５５に表示される。確認ボタン１５５ａを使用者が操作すると、ツリー分析の設定に基づいて、帳票分析の設定を自動的に行い、帳票分析を開始する。 As shown in Figure 22, when a user selects the "Negotiation motivation = Referral" column, i.e., the referral column 153a, an information display area 155 is generated next to the tree display area 153. In Figure 22, the area selected by the user is indicated by a dashed line. The information display area 155 displays specific numerical values of the difference occurring in the column selected by the user. Furthermore, when a user selects the referral column 153a, a confirmation button 155a for checking the progress is displayed in the information display area 155. When the user operates the confirmation button 155a, the report analysis settings are automatically configured based on the tree analysis settings, and the report analysis begins.

例えば、図２３に示す変換規則に基づいて、ツリー分析での設定情報から帳票分析の設定を生成することができる。自動設定では、例えば２グループの条件を比較して共通部分と異なる部分を抽出し、共通部分を帳票分析におけるフィルター設定、異なる部分を帳票分析における列の設定とすることができる。この変換規則も一例であり、他の変換規則を用いてもよい。 For example, report analysis settings can be generated from the setting information in tree analysis based on the conversion rules shown in Figure 23. In automatic settings, for example, the conditions of two groups are compared to extract the common and different parts, and the common parts are used as filter settings in report analysis, and the different parts are used as column settings in report analysis. This conversion rule is just one example, and other conversion rules may also be used.

図２２に示す例では、２つのグループの条件間の共通部分は「四半期＝第４四半期」で、異なる部分は「年度」の条件であるため、「年度」を列に設定した帳票を作成し、「四半期＝第４四半期」および選択されている「商談動機＝紹介」をフィルター条件に設定すると、図２４に示す出力画面１４０に表示されるような帳票を自動的に作成できる。この帳票には、ツリー分析で見えていた２０１８年、２０１９年の数値のほかに、他の年（例えば２０２０年）の数値も表示される。このように、２つのグループ間を比較するツリー分析を行っている時またはツリー分析を行った後に、帳票のような別種の分析を任意のタイミングで実行することができる。つまり、使用者は分析対象データの掘り下げを行いながら、必要に応じて着目している２グループの周辺の値を確認することができる。さらに、２つのグループ間で異なる条件が複数存在する場合、異なる条件の内、帳票の列に用いる条件の組を複数生成して、ユーザに提示してもよい。この場合、年や月といった時系列を表す条件を優先的にユーザに提示することもできる。また、帳票を生成する際に、ツリー分析で分析対象としていた第１の指標だけでなく、ユーザが他の帳票分析で利用している第２の指標を自動的に抽出し、生成した帳票に追加してもよい。 In the example shown in Figure 22, the common condition between the two groups is "Quarter = 4th Quarter," while the different condition is "Fiscal Year." Therefore, by creating a report with "Fiscal Year" as a column and setting "Quarter = 4th Quarter" and the selected "Negotiation Motivation = Referral" as filter conditions, a report like the one shown on output screen 140 in Figure 24 can be automatically generated. This report displays the values for other years (e.g., 2020) in addition to the values for 2018 and 2019 displayed in the tree analysis. In this way, other types of analysis, such as reports, can be performed at any time during or after a tree analysis comparing two groups. In other words, users can drill down into the data being analyzed and check the surrounding values of the two groups they are focusing on as needed. Furthermore, if there are multiple different conditions between the two groups, multiple sets of conditions can be generated for the report columns and presented to the user. In this case, conditions that represent time series, such as years or months, can be presented to the user preferentially. Furthermore, when generating a report, not only the first indicator that was the subject of analysis in the tree analysis, but also the second indicator that the user is using in other report analyses may be automatically extracted and added to the generated report.

このように、第４分析部２０によるツリー分析から第３分析部１９による帳票分析を派生されてモニタ３に表示させることができる。ツリー分析から派生させた帳票は元のツリー分析とは独立した分析であるために、必要であれば使用者側でこの帳票の設定を変更することもできる。使用者が所望の条件設定を行うと、その条件が主制御部１１で受け付けられる。これにより、さらに帳票分析を発展させたり、別個の分析として保存することができる。例えば、自動的に生成された設定では、第４四半期の推移を確認することができるが、例えば設定を変えることで、他の四半期を含む時系列の推移を確認することも可能である。 In this way, a report analysis by the third analysis unit 19 can be derived from the tree analysis by the fourth analysis unit 20 and displayed on the monitor 3. Because a report derived from a tree analysis is an analysis independent of the original tree analysis, the user can change the settings of this report if necessary. When the user sets the desired conditions, those conditions are accepted by the main control unit 11. This allows the report analysis to be further developed or saved as a separate analysis. For example, the automatically generated settings allow the trend for the fourth quarter to be confirmed, but by changing the settings, it is also possible to check the time series trend including other quarters.

さらに、別の実施形態として、図２５に示すように、ツリー表示領域１５３に表示されているツリー分析の情報に、帳票分析に相当する内容の情報を重畳表示させることもできる。具体的には、帳票分析に相当する内容の情報を表示するためのウインドウ１５６をツリー表示領域１５３内に表示させる。これにより、ツリー分析の表示形式に合わせた形態で帳票分析に相当する内容の情報を埋め込んで表示することができる。 Furthermore, as another embodiment, as shown in FIG. 25, information corresponding to form analysis can be superimposed on the tree analysis information displayed in the tree display area 153. Specifically, a window 156 for displaying information corresponding to form analysis is displayed within the tree display area 153. This allows information corresponding to form analysis to be embedded and displayed in a format that matches the display format of the tree analysis.

以上のように、図４に示すフローチャートのステップＳ３で作成された帳票を用いることで、使用者が最新の営業指標をモニタリングすることができるようになる。一方、帳票分析で特定の指標の値が悪化していることが分かった場合、その要因を分析することがしばしば必要となる。例えば、ある四半期について、商談が発生した会社と、発生しなかった会社にどのような違いがあったのかを分析したい場合、図４に示すフローチャートのステップＳ４に進み、帳票分析で用いたデータモデルから機械学習を用いた要因分析を実行できる。この要因分析のステップが、第１分析ステップである。 As described above, by using the report created in step S3 of the flowchart shown in Figure 4, users can monitor the latest sales indicators. On the other hand, if report analysis reveals that the value of a particular indicator has worsened, it is often necessary to analyze the causes. For example, if you want to analyze the differences between companies that had sales negotiations and companies that did not for a certain quarter, you can proceed to step S4 of the flowchart shown in Figure 4 and perform factor analysis using machine learning from the data model used in the report analysis. This factor analysis step is the first analysis step.

ステップＳ４に進むと、表示制御部３ａが図２６に示す要因分析の設定画面１７０を生成してモニタ３に表示させる。要因分析の設定画面１７０には、分析の単位を入力する単位入力領域１７１と、分析の目的を入力する目的入力領域１７２と、分析基準日を入力する基準日入力領域１７３とが設けられている。基準日入力領域１７３には、分析対象データを目的変数と特徴量の期間に分割する際の分割点となる日を予測基準日として使用者が入力可能になっている。この予測基準日に基づいて、集計期間をパラメータに持つ各特徴量の値を自動的に再計算することができる。 When proceeding to step S4, the display control unit 3a generates the factor analysis setting screen 170 shown in FIG. 26 and displays it on the monitor 3. The factor analysis setting screen 170 is provided with a unit input area 171 for inputting the unit of analysis, a purpose input area 172 for inputting the purpose of the analysis, and a base date input area 173 for inputting the analysis base date. In the base date input area 173, the user can input the date that will be the division point when dividing the data to be analyzed into periods of the target variable and feature quantities as the prediction base date. Based on this prediction base date, the values of each feature quantity that has the aggregation period as a parameter can be automatically recalculated.

この例では、「商談」データを２０１８年１２月までの集計期間と、２０１９年１月以降の集計期間とに分割し、前者を特徴量の集計に、後者を目的変数の集計に用いるよう設定している。このように期間を分割する設定を行うことで、例えば上記特許文献１に開示されている方法を用いた変換・結合処理によって目的変数と特徴量を自動的に生成し、機械学習向けのデータ変換処理を簡単に実行することができる。 In this example, the "Business Negotiations" data is divided into an aggregation period up to December 2018 and an aggregation period from January 2019 onwards, with the former being used to aggregate features and the latter being used to aggregate objective variables. By dividing the period in this way, objective variables and features can be automatically generated by conversion and combination processing using, for example, the method disclosed in Patent Document 1, and data conversion processing for machine learning can be easily performed.

また、この実施形態では使用者が帳票から変化点を発見したのち、要因分析の設定を手動で行っていたが、データ分析装置１が帳票から自動的に値の変化点を検出し、開始可能な要因分析を提示してもよい。その場合に、要因分析の設定の一部または全部を、データ分析装置１側が帳票の設定と値とに基づいて自動的に行ってもよい。 In addition, in this embodiment, the user manually configures the factor analysis after discovering a change point from the report, but the data analysis device 1 may automatically detect value change points from the report and present factor analysis that can be started. In that case, some or all of the configuration of the factor analysis may be automatically performed by the data analysis device 1 based on the report settings and values.

要因分析時の処理手順の一例を図２７のフローチャートに基づいて説明する。最初のステップＳＡ１は入力データ解析ステップであり、この入力データ解析ステップでは、入力された分析対象データ、及び複数の分析対象データ間の結合関係と、分析設定を解析する。この解析により、分析対象データ（ここでは会社）に対して、各分析対象データからどのような経路で変換・結合処理を行うかを決定する。 An example of the processing procedure for factor analysis will be explained based on the flowchart in Figure 27. The first step, SA1, is an input data analysis step, in which the input data to be analyzed, the connection relationships between multiple pieces of data to be analyzed, and the analysis settings are analyzed. This analysis determines the route to be taken from each piece of data to be analyzed for the data to be analyzed (companies in this case) for conversion and connection processing.

続くステップＳＡ２はパラメータ抽出ステップであり、このパラメータ抽出ステップでは、ステップＳＡ１で解析された情報に基づいて、目的変数と特徴量を生成するために必要なパラメータを生成する。ステップＳＡ２で生成されたパラメータは、特徴量の値を計算するために必要な集計関数や集計対象カラム等の情報を含んでおり、１つの特徴量につき１つのパラメータが生成される。パラメータの例を図２７中に示している。 The following step SA2 is a parameter extraction step, in which the parameters necessary to generate the objective variable and feature quantities are generated based on the information analyzed in step SA1. The parameters generated in step SA2 include information such as the aggregation function and target columns needed to calculate the feature quantity values, and one parameter is generated for each feature quantity. An example of the parameters is shown in Figure 27.

続くステップＳＡ３はＳＱＬ変換ステップであり、このＳＱＬ変換ステップでは、ステップＳＡ２で生成されたパラメータをＳＱＬと呼ばれるプログラミング言語に変換する。 The following step SA3 is an SQL conversion step, in which the parameters generated in step SA2 are converted into a programming language called SQL.

最後のステップＳＡ４はＳＱＬ実行ステップであり、このＳＱＬ実行ステップでは、データベースに対してＳＱＬを使った問い合わせを実行することで、特徴量の値を得る。 The final step, SA4, is an SQL execution step, in which a query using SQL is executed against the database to obtain feature values.

分析が完了すると、表示制御部３ａは図２８に示すセグメント出力画面１８０を生成してモニタ３に表示させる。セグメント出力画面１８０には、１つまたは複数のセグメントを表示可能なセグメント表示領域１８１が設けられている。セグメント表示領域１８１には、目的変数と関連度が大きい特徴量と、その特徴量を用いた場合に目的変数の値が高くなるようなセグメントを表示している。すなわち、第１分析部１５は、使用者により指定された目的変数と関連度が大きい特徴量を抽出するとともに、全データの目的変数の平均値を比較して、目的変数の平均値が相対的に高くなるセグメントを抽出してセグメント出力画面１８０に表示させる。尚、第１分析部１５は、前記目的変数と関連度が大きい特徴量と、全データの目的変数の平均値とを比較して、目的変数の平均値が相対的に低くなるセグメントを抽出してセグメント出力画面１８０に表示させてもよい。 When the analysis is complete, the display control unit 3a generates the segment output screen 180 shown in FIG. 28 and displays it on the monitor 3. The segment output screen 180 has a segment display area 181 that can display one or more segments. The segment display area 181 displays feature quantities that are highly associated with the objective variable and segments that will increase the value of the objective variable when those feature quantities are used. That is, the first analysis unit 15 extracts feature quantities that are highly associated with the objective variable specified by the user, compares the average value of the objective variable of all data, extracts segments with relatively high average values of the objective variable, and displays them on the segment output screen 180. Note that the first analysis unit 15 may also compare feature quantities that are highly associated with the objective variable with the average value of the objective variable of all data, extract segments with relatively low average values of the objective variable, and display them on the segment output screen 180.

セグメント表示領域１８１に表示されたセグメントに対応してチェックボックス１８１ａが設けられている。このチェックボックス１８１ａについては後述する。 Check boxes 181a are provided corresponding to the segments displayed in the segment display area 181. These check boxes 181a will be described later.

また、セグメント出力画面１８０には、平均値を表示する平均値表示領域１８２も設けられている。平均値表示領域１８２の代わりに、最大値を表示する最大値表示領域であってもよいし、最小値を表示する最小値表示領域であってもよい。 The segment output screen 180 also has an average value display area 182 that displays the average value. Instead of the average value display area 182, it may be a maximum value display area that displays the maximum value, or a minimum value display area that displays the minimum value.

図２８に示す例では、分析対象データの「会社」における平均の商談率は平均値表示領域１８２に表示されているとおり２３．３％であるのに対して、「直近９０日間の活動種別＝“メール送信”の数」が２件以上ある会社では商談率が３８．８％と、平均より１５．５ポイント高いことが分かる。同様に、「直近３０日間の活動種別=“ＴＥＬ”の数」が多い場合にも商談率が高いことも分かり、直近での営業活動のうち、メール送信と電話の回数が商談に影響を与えている可能性が示唆される。 In the example shown in Figure 28, the average negotiation rate for "companies" in the data being analyzed is 23.3%, as displayed in the average value display area 182, whereas for companies with two or more "Number of emails sent as activity type in the last 90 days," the negotiation rate is 38.8%, which is 15.5 points higher than the average. Similarly, it can be seen that the negotiation rate is also high when there are many "Number of phone calls made as activity type in the last 30 days," suggesting that the number of emails sent and phone calls made in recent sales activities may be affecting negotiations.

ここで、「直近９０日間の活動種別＝“メール送信”の数」は、第１分析部１５が、目的変数と関連度が大きい特徴量として自動的に生成した特徴量であり、元のデータモデルには存在しない。尚、「直近９０日間」は、この分析においては２０１９年１月１日を基準日としているため、「２０１９／０１／０１までの９０日間」の期間を意味する。 Here, the "number of emails sent for activity type = 'sent' in the last 90 days" is a feature automatically generated by the first analysis unit 15 as a feature highly related to the objective variable, and does not exist in the original data model. Note that in this analysis, the "last 90 days" refers to the period "90 days up to January 1, 2019," since January 1, 2019 is used as the base date.

次に、図４に示すフローチャートのステップＳ５に進む。ステップＳ５は特徴量の出力ステップであり、ステップＳ４で生成された特徴量を他の分析でも利用できるように出力する。この出力ステップは出力部１８が実行するものであり、自動的に生成した新しい特徴量や、要因分析に基づいて抽出されたセグメント、目的変数との関連度が大きい特徴量を次回の分析対象となるデータモデルに付加する。 Next, proceed to step S5 of the flowchart shown in Figure 4. Step S5 is a feature output step, in which the feature generated in step S4 is output so that it can be used in other analyses. This output step is executed by the output unit 18, and new automatically generated feature values, segments extracted based on factor analysis, and feature values with a high degree of correlation with the target variable are added to the data model to be analyzed next.

具体的な手順は図３０のフローチャートに示す通りである。まず、ステップＳＢ１において、図２９に示すセグメント出力画面１８０のセグメント表示領域１８１に表示されているセグメントを使用者が選択する。この例では、セグメントに対応するチェックボックス１８１ａをチェックする操作が選択操作であるが、この操作に限定されるものではない。主制御部１１は、使用者による選択操作を受け付けると、セグメント出力画面１８０に出力ボタン１８３を表示させる。使用者が出力ボタン１８３を操作すると、選択された特徴量の特徴量パラメータを読み出し、特徴量の出力先を変更することで、特徴量の値を入力データに対して計算できるようにする。また、必要であれば、他のパラメータ、例えば基準日となる日付を調整してもよい。これが図３０に示すフローチャートのステップＳＢ２の処理である。その後、ステップＳＢ３、ＳＢ４では、それぞれ図２７に示すフローチャートのステップＳＡ３、ＳＡ４と同様に、ＳＱＬ変換ステップとＳＱＬ実行ステップを行う。 Specific steps are as shown in the flowchart in Figure 30. First, in step SB1, the user selects a segment displayed in the segment display area 181 of the segment output screen 180 shown in Figure 29. In this example, the selection operation is the operation of checking the check box 181a corresponding to the segment, but this operation is not limited to this. When the main control unit 11 accepts the user's selection operation, it displays the output button 183 on the segment output screen 180. When the user operates the output button 183, the feature parameters of the selected feature are read and the output destination of the feature is changed, so that the feature value can be calculated for the input data. If necessary, other parameters, such as the reference date, may also be adjusted. This is the processing of step SB2 in the flowchart shown in Figure 30. Then, in steps SB3 and SB4, SQL conversion steps and SQL execution steps are performed, respectively, similar to steps SA3 and SA4 in the flowchart shown in Figure 27.

このように、調整後の特徴量パラメータに対して、分析時と同様にＳＱＬ変換と実行を適用することで、分析に用いた特徴量を簡単に入力データにも反映させることができる。また、基準日の調整も同時に行うことで、機械学習による分析で用いた特徴量を、機械学習以外の用途で使いやすい形式に変換することもできる。この基準日は「２０１９／０１／０１」のように日付で指定してもよいし、「現在日時」のように設定をすることで、表示のたびに更新してもよい。 In this way, by applying SQL conversion and execution to the adjusted feature parameters in the same way as during analysis, the features used in the analysis can easily be reflected in the input data. Furthermore, by simultaneously adjusting the base date, the features used in machine learning analysis can be converted into a format that is easy to use for purposes other than machine learning. This base date can be specified as a date, such as "2019/01/01," or can be set to "current date and time" and updated each time the data is displayed.

特徴量の出力が完了した後、図３１に示すように、分析対象データ画面１００が更新される。具体的には、出力された特徴量がデータ表示領域１０３に「会社」の分析用データの属性として追加される。次の分析を開始した際には、この入力データを用いることで、追加した特徴量を元の属性と同じように利用することができる。 After the feature output is complete, the analysis target data screen 100 is updated as shown in Figure 31. Specifically, the output feature is added to the data display area 103 as an attribute of the "Company" analysis data. When the next analysis is started, this input data can be used to utilize the added feature in the same way as the original attribute.

次に、図４に示すフローチャートのステップＳ６に進む。ステップＳ６は、第２分析部１６による予測分析の実行ステップ、即ち調整後のデータモデルに対して予測分析を実行する第２分析ステップであり、予測対象のデータごとに目的変数の値を予測する。 Next, the process proceeds to step S6 in the flowchart shown in Figure 4. Step S6 is the step in which the second analysis unit 16 performs predictive analysis, i.e., the second analysis step in which predictive analysis is performed on the adjusted data model, and the value of the objective variable is predicted for each piece of data to be predicted.

上記ステップＳ５では、例えば会社ごとの商談の発生有無に対して、どのような特徴量が高い相関があるのかを把握できる。実際の営業データ分析においては、単に要因を分析するだけではなく、今後商談が発生する見込みの高い会社を抽出するなどして、営業活動のリソース配分を効率化することが可能であり、そのために予測分析を用いることがある。 In step S5 above, it is possible to determine which features are highly correlated with whether or not a business negotiation will occur for each company. In actual sales data analysis, rather than simply analyzing factors, it is possible to more efficiently allocate resources for sales activities by, for example, extracting companies with a high probability of future business negotiations, and predictive analysis is sometimes used for this purpose.

予測分析の実行ステップでは、ステップＳ５で得られた会社データに基づいて、次の９０日間に商談が発生する確度が高い会社を予測するものとする。このステップでは、表示制御部３ａが図３２に示す予測分析の設定画面１９０を生成してモニタ３に表示させる。予測分析の設定画面１９０には、図２６に示す要因分析の設定画面１７０と同様に、分析の単位を入力する単位入力領域１９１と、分析の目的を入力する目的入力領域１９２と、分析基準日を入力する基準日入力領域１９３とが設けられている。基準日入力領域１９３には、分析の基準日として、学習時の基準日（学習基準日）と、学習時の基準日とは異なる予測時の基準日との設定が可能になっている。 In the predictive analysis execution step, companies with a high probability of business negotiations occurring within the next 90 days are predicted based on the company data obtained in step S5. In this step, the display control unit 3a generates the predictive analysis setting screen 190 shown in FIG. 32 and displays it on the monitor 3. Similar to the factor analysis setting screen 170 shown in FIG. 26, the predictive analysis setting screen 190 is provided with a unit input area 191 for inputting the unit of analysis, a purpose input area 192 for inputting the purpose of the analysis, and a base date input area 193 for inputting the analysis base date. In the base date input area 193, it is possible to set the base date for analysis as the base date for learning (learning base date) or a base date for prediction that is different from the base date for learning.

また、予測分析においては、単に精度よく予測できるだけでなく、予測結果の根拠が必要となる用途も多い。そこで、本実施形態では、予測分析の設定の中で、スコアリングの計算方式を、ルールベースの方式と機械学習を用いた方式とから選択可能としている。表示制御部３ａは、図３３に示すスコアリング設定画面２００を生成してモニタ３に表示させる。スコアリング設定画面２００には、ルール方式と、機械学習方式とのうち、一方の選択が使用者によって可能な方式選択領域２０１が設けられている。これら方式の選択は、ボタン操作等で行うことが可能であるが、どのような方法であってもよい。方式選択領域２０１には、ルール方式と機械学習方式のそれぞれの概要について説明書きが記載されている。スコアリング設定画面２００には、ルール作成方法の選択領域２０２も設けられている。ルール作成方法の選択領域２０２では、データ分析装置１が自動で生成する「自動生成」と、使用者が任意のルールを指定する「ルールを指定」の２つの選択肢が表示されており、これら選択肢のうち、使用者が一方を選択可能になっている。また、スコアリング設定画面２００には、ルール数を入力するための入力領域２０３も設けられており、使用者が任意のルール数を入力可能になっている。さらに、スコアリング設定画面２００には、ルール作成時に使用する属性の選択領域２０４も設けられており、使用者が１以上の任意の数の属性を選択できるようになっている。 Furthermore, predictive analysis often requires not only accurate predictions but also evidence for the prediction results. Therefore, in this embodiment, the scoring calculation method can be selected from a rule-based method and a machine learning method during predictive analysis settings. The display control unit 3a generates the scoring setting screen 200 shown in FIG. 33 and displays it on the monitor 3. The scoring setting screen 200 includes a method selection area 201 that allows the user to select either the rule-based method or the machine learning method. Selection of these methods can be performed by button operation, or by any method. The method selection area 201 contains an explanation of the rule-based method and the machine learning method. The scoring setting screen 200 also includes a rule creation method selection area 202. The rule creation method selection area 202 displays two options: "Automatic generation," which allows the data analysis device 1 to automatically generate rules, and "Specify rules," which allows the user to specify arbitrary rules. The user can select either of these options. The scoring setting screen 200 also has an input area 203 for inputting the number of rules, allowing the user to input any number of rules. Furthermore, the scoring setting screen 200 also has a selection area 204 for attributes to be used when creating rules, allowing the user to select any number of attributes greater than or equal to one.

機械学習方式では、目的変数が１となる確率を表す予測値（スコア）は機械学習モデルが出力した予測結果から計算されるが、ルール方式の場合は、ルール（条件式）に何個該当したかを予測対象の行ごとに数え上げることで、スコアを計算する。スコアリング設定画面２００で例えば４つのルールを設定した場合、ルールへの該当数を行ごとに計算した結果、図３４に示すように、行ごとに０～４の該当数が得られる。この該当数と、別途計算しておいた該当数－スコアの対応表を照合することで、行ごとのスコアを計算することができる。この対応表は、学習用データを用いた集計等によって事前に計算しておけばよい。ルールは、例えば「会社規模が“Ａ”に合致するかどうか」など、属性と値の組み合わせによって表現してもよいし、データモデルで定義されているセグメントを用いて、「セグメントＸに該当するかどうか」といった表現をしてもよい。 In the machine learning method, the predicted value (score) representing the probability that the objective variable will be 1 is calculated from the prediction results output by the machine learning model. However, in the rule method, the score is calculated by counting the number of rules (conditional expressions) that apply to each row of the prediction target. If, for example, four rules are set on the scoring setting screen 200, the number of rule matches is calculated for each row, resulting in a number of matches ranging from 0 to 4 for each row, as shown in Figure 34. The score for each row can be calculated by comparing this number of matches with a separately calculated correspondence table of the number of matches and scores. This correspondence table can be calculated in advance by aggregating the training data, for example. Rules can be expressed as a combination of attributes and values, such as "whether the company size matches 'A'," or they can be expressed as "whether it applies to segment X" using segments defined in the data model.

ルール自体は分析を行う使用者が自分で指定してもよいし、決定木分析などの分析手法を使うことで、目的変数の値が高いグループを抽出できるようなルールを分析エンジンが自動的に生成してもよい。 The rules themselves can be specified by the user performing the analysis, or the analysis engine can automatically generate rules that can extract groups with high values for the objective variable by using analytical techniques such as decision tree analysis.

スコアリング方式を選択して分析を開始すると、分析部では、機械学習に用いる学習用のデータと予測用のデータを生成する。この際、分析対象となるデータの中に、基準日をパラメータに持つ特徴量が含まれていた場合、学習用のデータの基準日と予測用のデータの基準日は異なるため、学習用のデータから生成された特徴量と、予測用のデータから生成された特徴量が変わることがある。そのため、基準日の再調整が行われ、予測用のデータの基準日に基づいて、予測用のデータの集計期間が自動的に調整される。これが、データモデル設定部１２ｂで設定されたデータモデルを分析設定情報に基づいて調整するデータ調整ステップであり、例えば第１調整部１３または第２調整部１４が実行する。すなわち、図４に示すフローチャートのステップＳ５で追加された特徴量が基準日を持っているため、図３５に示すように、学習用データでは基準日を２０１９年１月１日、予測用データでは基準日を２０１９年４月１日として特徴量の値がそれぞれ再計算される。続く変換・結合処理では、学習用データに対して、上記特許文献１に記載されている方法で目的変数の値を付加する。 When a scoring method is selected and analysis begins, the analysis unit generates training data and prediction data to be used for machine learning. If the data to be analyzed contains features with a reference date as a parameter, the reference dates for the training data and the prediction data will differ, which may result in differences between the features generated from the training data and the prediction data. Therefore, the reference dates are readjusted, and the aggregation period for the prediction data is automatically adjusted based on the reference date for the prediction data. This is the data adjustment step, which adjusts the data model set by the data model setting unit 12b based on the analysis setting information. This step is executed by, for example, the first adjustment unit 13 or the second adjustment unit 14. That is, because the features added in step S5 of the flowchart shown in FIG. 4 have reference dates, the feature values are recalculated using January 1, 2019 for the training data and April 1, 2019 for the prediction data, as shown in FIG. 35. In the subsequent conversion and combination process, the value of the objective variable is added to the training data using the method described in Patent Document 1.

データ生成が完了すると、学習用データでモデルの学習を行い、予測データの行ごとに予測値（スコア）が計算される。予測が完了すると、予測値をプレビューする予測値表示画面２１０（図３６に示す）に遷移する。すなわち、表示制御部３ａは予測値表示画面２１０を生成してモニタ３に表示させる。 Once data generation is complete, the model is trained using the training data, and a predicted value (score) is calculated for each row of the prediction data. Once prediction is complete, the screen transitions to a predicted value display screen 210 (shown in Figure 36) where the predicted values can be previewed. In other words, the display control unit 3a generates the predicted value display screen 210 and displays it on the monitor 3.

予測値表示画面２１０には、一覧表示領域２１１が設けられている。一覧表示領域２１１には、予測値が高い順に会社が一覧表示されている。また、予測値表示画面２１０には、一覧表示領域２１１の隣に、グラフ表示領域２１２と、フィルター設定領域２１３とが設けられている。グラフ表示領域２１２のスライドバー２１２ａの右端部ないし左端部を左右に動かすことで、一覧表示領域２１１に表示するデータの件数を増減できる。スライドバー２１２ａは、使用者がデータの件数を増減させるための増減操作部の一例であるが、スライドバー２１２ａ以外の形態でデータの件数を増減可能にしてもよい。 The forecast value display screen 210 has a list display area 211. In the list display area 211, companies are listed in descending order of forecast value. The forecast value display screen 210 also has a graph display area 212 and a filter setting area 213 next to the list display area 211. The number of data items displayed in the list display area 211 can be increased or decreased by moving the right or left end of the slide bar 212a in the graph display area 212 left or right. The slide bar 212a is an example of an increase/decrease operation unit that allows the user to increase or decrease the number of data items, but the number of data items may also be increased or decreased in a form other than the slide bar 212a.

フィルター設定領域２１３では、一覧表示領域２１１に表示するデータを、条件に合致するものに絞り込むことができる。例えば、優先的に営業活動を行うべき会社を、商談見込みに基づいて上位１００社分抽出したい、という場合には、グラフ表示領域２１２及びフィルター設定領域２１３で件数を１００件に調整したのち、一覧表示領域２１１にプレビューされているデータをダウンロードすることで、営業活動用の客先リストを作成することができる。 In the filter setting area 213, you can narrow down the data displayed in the list display area 211 to only that which meets your criteria. For example, if you want to extract the top 100 companies that should be prioritized for sales activities based on potential sales negotiations, you can create a customer list for sales activities by adjusting the number of items to 100 in the graph display area 212 and filter setting area 213, and then downloading the data previewed in the list display area 211.

一般的に、図３６の一覧表示領域２１１に示すように、予測スコアが高いデータから順に施策対象を並べるのが施策を検討する上では有効であるが、一覧表示領域２１１に表示されている上位のごく少数だけを施策を適用すべきデータ範囲として施策の対象者としてしまうと、平均スコアは上昇するため効率的に施策を実施できる一方で、対象者が少ないため施策により得られる総利益は小さくなってしまう場合がある。一方、施策対象者を増やしていった場合、ある時点で利益を施策の実施コストが上回ってしまい、増やせば増やすほど利益が低下してしまうことになる。 Generally, when considering a policy, it is effective to arrange the targets of a policy in descending order of the data with the highest predicted score, as shown in the list display area 211 in Figure 36. However, if only the very few top-ranked data displayed in the list display area 211 are targeted as the data range to which the policy should be applied, the average score will increase, allowing the policy to be implemented efficiently, but the total profit obtained from the policy may be small due to the small number of targets. On the other hand, if the number of policy targets is increased, at some point the cost of implementing the policy will exceed the profit, and the more the number is increased, the lower the profit will be.

本実施例のような営業活動用の会社リストのような用途では、営業リソースの総量で施策対象者のサイズの上限が決まる場合が多いため、多くの場合あまり調整の余地が無い一方、ダイレクトメールの送付やインターネット広告等の施策の場合は、施策対象者のサイズをコントロールできる場合が多く、利益を最大化する施策対象者サイズをうまく決定したい。また、施策を実行した場合に得られる投資対効果（ＲＯＩ）を事前に知りたいこともある。 For applications such as a company list for sales activities, as in this example, the upper limit on the size of the target population is often determined by the total amount of sales resources, so there is often little room for adjustment. However, for measures such as sending direct mail or internet advertising, the size of the target population can often be controlled, and it is desirable to determine the target population size that will maximize profits. It is also sometimes desirable to know in advance the return on investment (ROI) that will be obtained if the measure is implemented.

そこで、本実施形態に係るデータ分析装置１は予想ＲＯＩの計算機能を有している。具体的には、表示制御部３ａが図３７に示すようなＲＯＩ計算領域２１４を生成して予測値表示画面２１０に重畳表示させる。ＲＯＩ計算領域２１４には、施策１件あたりにかかるコスト（Ｃ）を入力するコスト入力領域２１４ａと、獲得（目標達成）１件あたりに得られる利益（Ｒ）を入力する利益入力領域２１４ｂと、対象件数（Ｎ）を表示する件数表示領域２１４ｃと、ターゲットの平均スコア（ｐ）を表示するスコア表示領域２１４ｄと、計算したＲＯＩを表示するＲＯＩ表示領域２１４ｅとが設けられている。件数表示領域２１４ｃには、施策を適用すべきデータ範囲に含まれるデータの数が表示され、ここに表示される件数は、グラフ表示領域２１２及びフィルター設定領域２１３で調整可能である。使用者がコスト入力領域２１４ａ及び利益入力領域２１４ｂに金額をそれぞれ入力すると、主制御部１１が現在の範囲選択の状態を元に、選択中の件数（Ｎ）と、選択範囲における平均スコア（ｐ）を算出する。 Therefore, the data analysis device 1 according to this embodiment has a function for calculating the expected ROI. Specifically, the display control unit 3a generates an ROI calculation area 214 as shown in FIG. 37 and displays it superimposed on the predicted value display screen 210. The ROI calculation area 214 includes a cost input area 214a for inputting the cost (C) per measure, a profit input area 214b for inputting the profit (R) per acquisition (goal achievement), a count display area 214c for displaying the number of targets (N), a score display area 214d for displaying the average score (p) of the targets, and an ROI display area 214e for displaying the calculated ROI. The count display area 214c displays the number of data included in the data range to which the measure should be applied. The number of data displayed here can be adjusted in the graph display area 212 and filter setting area 213. When the user enters amounts in the cost input area 214a and the profit input area 214b, the main control unit 11 calculates the number of selected items (N) and the average score (p) for the selected range based on the current range selection state.

これらの数値から、施策の総コストがＮ×Ｃ、施策によって得られる総利益がＮ×Ｒ×ｐと計算できるため、主制御部１１がＮ×Ｒ×ｐ－Ｎ×Ｃの式でＲＯＩを計算する。計算結果は、ＲＯＩ表示領域２１４ｅに表示される。ＲＯＩを画面中のスライドバー２１２ａやフィルター設定に連動して再計算することで、ＲＯＩを加味しながら施策の対象者サイズ（施策を実行すべきデータ件数）を決定することができる。あるいは、ＲＯＩが最大となるような対象者サイズをデータ分析装置１が自動的に算出し、使用者に提示してもよい。 From these figures, the total cost of the measure can be calculated as N x C, and the total profit gained from the measure as N x R x p. Therefore, the main control unit 11 calculates the ROI using the formula N x R x p - N x C. The calculation result is displayed in the ROI display area 214e. By recalculating the ROI in conjunction with the on-screen slide bar 212a and filter settings, the target size of the measure (the number of data items on which the measure should be implemented) can be determined taking the ROI into account. Alternatively, the data analysis device 1 may automatically calculate the target size that maximizes the ROI and present it to the user.

上述した例では使用者が自ら予測分析の設定を行ったが、ステップＳ４で帳票分析から要因分析を開始する場合と同様に、要因分析から予測分析を開始するよう提案することもできる。この場合、要因分析と予測分析は目的変数の設定が共通して必要であるため、要因分析から予測分析を開始する際には、目的変数の設定を省略することができる。 In the example above, the user sets up the predictive analysis themselves, but it is also possible to suggest starting predictive analysis from factor analysis, just as in starting factor analysis from report analysis in step S4. In this case, since factor analysis and predictive analysis both require the setting of a target variable, setting the target variable can be omitted when starting predictive analysis from factor analysis.

要因分析から予測分析への連携、即ち、要因分析で選択したセグメントから予測分析を開始する例について、図３８に基づいて説明する。図３８の上側に記載している画面は、特徴量を出力する場合のセグメント出力画面１８０である。また、図３８の下側に記載している画面は、スコアリング設定画面２００である。セグメント出力画面１８０で選択したセグメントを、スコアリング設定画面２００で示すようにルール方式の予測分析におけるルールとして採用することで、使用者は要因分析で発見されたセグメントの中から、ビジネス上の解釈性が高い等の理由で好ましいセグメントを自由に選択して、それらを利用した予測分析を開始することができる。 An example of linking factor analysis to predictive analysis, i.e., starting predictive analysis from a segment selected in factor analysis, will be described with reference to Figure 38. The screen shown at the top of Figure 38 is the segment output screen 180 when outputting features. The screen shown at the bottom of Figure 38 is the scoring setting screen 200. By adopting the segments selected on the segment output screen 180 as rules in rule-based predictive analysis as shown on the scoring setting screen 200, users can freely select preferred segments from those discovered in factor analysis for reasons such as high business interpretability, and start predictive analysis using these segments.

次に、図４に示すフローチャートのステップＳ７に進む。ステップＳ７は、セグメントの出力ステップである。ステップＳ６では商談見込みの高い会社を抽出することが可能であるが、抽出されたリストを営業活動に用いるだけでなく、分析にも使いたいことがある。その場合、図３６に示す予測値表示画面２１０の「セグメントに出力」ボタン２１５を操作することで、表示している会社リストを保存できる。図３９はセグメントの保存画面２２０を示すものである。表示制御部３ａはセグメントの保存画面２２０を生成してモニタ３に表示させる。セグメントの保存画面２２０には、セグメントの保存時の名称を表示する名称表示領域２２１が設けられている。この例では、「商談見込み上位１００件」という名前で、上位１００件の会社をセグメントとして保存する。データモデルに保存したセグメントは、図２８に示すセグメント出力画面１８０等から定義や該当者の割合を確認することができる。また、図３９に示すセグメントの保存画面２２０では分析結果から生成されたセグメントを確認するだけでなく、新たなセグメントを追加することもできる。 Next, proceed to step S7 of the flowchart shown in Figure 4. Step S7 is the segment output step. In step S6, companies with a high potential for negotiations can be extracted. However, you may want to use the extracted list not only for sales activities but also for analysis. In this case, you can save the displayed company list by operating the "Output to Segment" button 215 on the forecast value display screen 210 shown in Figure 36. Figure 39 shows the segment save screen 220. The display control unit 3a generates the segment save screen 220 and displays it on the monitor 3. The segment save screen 220 has a name display area 221 that displays the name of the segment when saved. In this example, the top 100 companies are saved as a segment under the name "Top 100 Potential for Negotiations." The definition and percentage of applicable users of segments saved in the data model can be confirmed from the segment output screen 180 shown in Figure 28, etc. Furthermore, the segment save screen 220 shown in Figure 39 not only allows you to confirm segments generated from the analysis results, but also allows you to add new segments.

セグメントは分析用データのいずれか１つに対して定義される。セグメントは、分析用データからその一部を抽出することができるのであれば、任意の定義であってもよく、例えば、図４０に示す条件式の設定画面２３０を用いて定義することもできる。図４０に示す条件式の設定画面２３０は表示制御部３ａが生成してモニタ３に表示させる。条件式の設定画面２３０には、条件式の入力領域２３１が設けられており、この条件式の入力領域２３１は２つ以上設けられていてもよい。この例では、「会社規模がＡ、かつ所在地が東京都」のように１つ以上の条件式を複数組み合わせて定義できる。あるいは、セグメントが「照合用テーブルと行の対応付けが可能なもの」のように、別のテーブルと照合する形で定義されてもよい。 A segment is defined for any one of the analysis data. A segment may be defined in any way as long as it is possible to extract a portion of the analysis data. For example, a segment may be defined using the conditional expression setting screen 230 shown in FIG. 40. The conditional expression setting screen 230 shown in FIG. 40 is generated by the display control unit 3a and displayed on the monitor 3. The conditional expression setting screen 230 has a conditional expression input area 231, and two or more conditional expression input areas 231 may be provided. In this example, a combination of one or more conditional expressions can be defined, such as "company size A and location Tokyo." Alternatively, a segment may be defined by matching it with another table, such as "one that can associate rows with a matching table."

予測分析から生成されたセグメントは、図４１に示すように、予測分析の内部で生成されたテーブルを照合する形で定義することが可能である。照合テーブルを用いたセグメントの場合、分析用のデータが更新されても該当／非該当は同じＩＤに対しては変化しないのに対して、条件式を用いたセグメントの場合、分析用のデータが更新されることで、同じＩＤでも属性値が変われば該当／非該当が変化することがある。そのため、「ある時点で施策対象者だった集団」のような、対象者を固定しておきたい用途では前者が適しているが、「直近１週間でＷｅｂページにアクセスした集団」のような、現時点での対象者を知りたい用途では後者が適している。 Segments generated from predictive analysis can be defined by matching a table generated within the predictive analysis, as shown in Figure 41. When using a matching table for a segment, even if the data used for analysis is updated, the applicable/inapplicable status for the same ID will not change. However, when using a conditional expression for a segment, even if the same ID is used, the applicable/inapplicable status may change if the attribute value changes when the data used for analysis is updated. For this reason, the former is suitable for applications where you want to keep the target group fixed, such as "a group that was the target of a policy at a certain point in time," while the latter is suitable for applications where you want to know the target group at the current time, such as "a group that accessed a web page in the past week."

次に、図４に示すフローチャートのステップＳ８に進む。ステップＳ８は、帳票の更新ステップであり、ステップＳ７で保存したセグメントを利用して、新たな帳票分析を開始する。表示制御部３ａは図４２に示す設定画面２５０を生成してモニタ３に表示させる。この設定画面２５０では、ステップＳ３の帳票作成時に用いた値に加えて別の値を定義している。具体的には、「ＴＥＬ件数」という値を新たに定義することで、営業担当別に活動量を可視化している。また、ステップＳ６で作成した「商談見込み上位１００件」という条件で帳票全体を絞り込むことで、集計対象をステップＳ７で抽出した商談見込みのある会社に限定している。 Next, proceed to step S8 in the flowchart shown in Figure 4. Step S8 is a report update step, in which a new report analysis is started using the segments saved in step S7. The display control unit 3a generates the setting screen 250 shown in Figure 42 and displays it on the monitor 3. This setting screen 250 defines additional values in addition to the values used when creating the report in step S3. Specifically, by defining a new value called "Number of Phone Calls," the activity level is visualized for each sales representative. In addition, by narrowing down the entire report using the condition "Top 100 Potential Negotiations" created in step S6, the targets for aggregation are limited to companies with potential negotiations extracted in step S7.

ステップＳ５で出力された特徴量もこの帳票に用いることができるので、たとえば特徴量をフィルター条件に用いることで、「商談見込み上位１００件に該当するにも関わらず、直近９０日間でメール送信の数が０件の会社」のような条件で帳票を確認することもできる。このように、ステップＳ６で抽出された商談可能性が高い会社に対して、営業担当者別の指標を帳票化することで、各営業担当者が抽出した会社に実際に活動を実施できているか、商談や成約が得られているかどうかをモニタリングすることができる。 The features output in step S5 can also be used in this report, so by using the features as filter conditions, you can check the report using conditions such as "companies that are in the top 100 potential sales opportunities but have sent zero emails in the last 90 days." In this way, by creating a report of indicators for each sales representative for the companies with high sales potential extracted in step S6, you can monitor whether each sales representative is actually carrying out activities with the extracted companies and whether sales opportunities or deals are being obtained.

（実施形態の作用効果）
以上説明したように、使用者は分析対象データをデータ分析装置１に入力し、分析対象データ間の行の対応関係を定義してデータモデルを設定すると、そのデータモデルを用いて要因分析や予測分析を行うことができる。このとき、分析の結果として特徴量やセグメントが得られた場合、それらのデータモデルに付加することができ、また、特徴量の集計パラメータを分析の設定情報に応じて自動的に再調整することもできる。これにより、使用者は特徴量を複数の分析で用いる際に、分析の特性に応じた個別の再調整を自ら行うことなく、また、高度なプログラミング技術を持った専門家を介さなくても、共通の入力データで様々な分析が行えるようになる。 (Effects of the embodiment)
As described above, a user can input data to be analyzed into the data analysis device 1, define the correspondence between rows in the data to be analyzed, and set up a data model, which can then be used to perform factor analysis and predictive analysis. If features or segments are obtained as a result of the analysis, they can be added to the data model, and the aggregation parameters for the features can also be automatically readjusted according to the analysis setting information. This allows users to use features in multiple analyses without having to manually readjust them individually according to the characteristics of the analysis, and without the need for an expert with advanced programming skills, allowing them to perform a variety of analyses using common input data.

このように、共通の入力データから様々な分析を可能にすることで、使用者側で分析のために必要な準備の手間が大幅に削減されるとともに、ある分析で見つかった有用な知見を他の分析で容易に利用できるようになる。 In this way, by enabling a variety of analyses from common input data, the amount of preparation required for analysis by the user is significantly reduced, and useful insights discovered in one analysis can be easily used in other analyses.

また、定期的に行う分析では、入力データを最新版に更新することで、複数の分析を一括して更新することもできるようになり、データ分析を業務フローに組み込んで繰り返し実行する場合には、工数削減効果がより一層大きなものになる。 In addition, for analyses that are performed periodically, updating input data to the latest version makes it possible to update multiple analyses at once, and if data analysis is incorporated into business processes and performed repeatedly, the labor-hour reduction effect will be even greater.

上述の実施形態はあらゆる点で単なる例示に過ぎず、限定的に解釈してはならない。さらに、特許請求の範囲の均等範囲に属する変形や変更は、全て本発明の範囲内のものである。 The above-described embodiments are merely illustrative in all respects and should not be interpreted as limiting. Furthermore, all modifications and variations that fall within the scope of the claims are within the scope of the present invention.

以上説明したように、本発明に係るデータ分析装置及びデータ分析方法は、例えば企業が持っている様々なデータを集計・可視化する場合等に利用できる。 As explained above, the data analysis device and data analysis method according to the present invention can be used, for example, to aggregate and visualize various data held by a company.

１データ分析装置
３モニタ（表示部）
３ａ表示制御部
１２ａデータ入力部
１２ｂデータモデル設定部
１３第１調整部
１４第２調整部
１５第１分析部
１６第２分析部
１８出力部
１９第３分析部
２０第４分析部 1 Data analysis device 3 Monitor (display unit)
3a Display control unit 12a Data input unit 12b Data model setting unit 13 First adjustment unit 14 Second adjustment unit 15 First analysis unit 16 Second analysis unit 18 Output unit 19 Third analysis unit 20 Fourth analysis unit

Claims

In a data analysis device for analyzing data,
a data input unit for inputting a plurality of tabular data having a plurality of feature quantities;
a data model setting unit that accepts setting of relation information that defines a correspondence relationship between the plurality of tabular data items inputted to the data input unit and sets a data model to be analyzed;
a data adjustment unit that adjusts the data model set by the data model setting unit based on analysis setting information;
a first analysis unit that performs a first analysis on the data model set by the data model setting unit and generates a first analysis result;
a second analysis unit that performs a second analysis on the data model adjusted by the data adjustment unit and generates a second analysis result;
an output unit that adds new feature quantities included in the analysis results of at least one of the first analysis unit and the second analysis unit to a data model to be analyzed next;
The first analysis unit
As the analysis setting information, a target variable specified by a user and a setting of a learning reference date can be accepted,
extracting features highly related to the specified objective variable, and performing a factor analysis to extract segments in which the average value of the objective variable is relatively high or low compared to the average value of the objective variable of all data, by aggregating the features based on data aggregated for a period before a learning reference date set by a user as the analysis setting information, and aggregating the objective variable based on data aggregated for a period after the learning reference date;
the output unit adds a feature quantity having a high degree of association with the objective variable to a data model to be analyzed next based on a result of the factor analysis executed by the first analysis unit;
The second analysis unit
The analysis setting information can accept a setting of a prediction reference date that is different from the learning reference date,
If the data model to be predicted includes a feature having an aggregation period as a parameter, automatically recalculating the value of each feature aggregated by the first analysis unit based on the prediction reference date and the aggregation period;
a data analysis device configured to be capable of executing predictive analysis to predict the value of the dependent variable for each data set to be predicted, based on a data model to which the recalculated feature quantities have been extracted as feature quantities highly associated with the dependent variable by the factor analysis executed by the first analysis unit and added.

2. The data analysis apparatus according to claim 1 ,
the data model setting unit further receives a setting of a segment for extracting a portion of data from the plurality of tabular data, and sets the data model to be analyzed;
The data analysis device is characterized in that the output unit further adds new segments included in the analysis results of at least one of the first analysis unit and the second analysis unit to the data model to be analyzed next.

2. The data analysis apparatus according to claim 1 ,
the first analysis unit automatically generates new features that do not exist in the original data model as features that have a high degree of association with the objective variable;
The data analysis apparatus is characterized in that the output unit adds the new feature quantity automatically generated by the first analysis unit to a data model to be analyzed next.

4. The data analysis device according to claim 3 ,
The data analysis apparatus is characterized in that the output unit further adds the segments extracted based on the factor analysis performed by the first analysis unit to a data model to be analyzed next.

5. The data analysis apparatus according to claim 1 ,
The data analysis apparatus is characterized in that the second analysis unit performs predictive analysis to predict the value of the dependent variable for each data item to be predicted.

6. The data analysis apparatus according to claim 5 ,
The first analysis unit is capable of accepting a setting of a prediction base date as the analysis setting information, and if the data model to be predicted includes a feature having an aggregation period as a parameter, the first analysis unit automatically recalculates the value of each feature having the aggregation period as a parameter based on the prediction base date.

7. The data analysis apparatus according to claim 1 ,
The data analysis device is characterized in that the second analysis unit scores the predictive analysis according to a method selected by a user from either a rule-based method or a machine learning method.

8. The data analysis device according to claim 5 ,
The second analysis unit displays the objective variables for each data item to be predicted by the predictive analysis in descending order of score, and accepts input from a user of the data range to which the measures should be applied, the cost per measure, and the profit to be obtained per achievement of the objective, thereby calculating the total cost and total profit to be obtained when the measures are applied to the data range.

8. The data analysis device according to claim 5 ,
The second analysis unit is a data analysis device that accepts input of the cost per measure and the profit obtained per achievement of the objective, calculates the total cost of the measure and the total profit obtained by the measure, and automatically calculates the number of data items for which the measure should be implemented.

10. The data analysis device according to claim 5 ,
The data analysis device is characterized in that the output unit outputs a portion of data with high scores among the objective variables for each data item to be predicted by the predictive analysis as a segment and adds it to the data model to be analyzed next.

2. The data analysis apparatus according to claim 1,
a third analysis unit that executes a form analysis based on the data model and displays a result of the form analysis in a matrix;
A data analysis device that receives a selection of reference data and comparison data from a user on the matrix, and further displays information related to the difference between the two received data.

The data analysis device according to claim 11 ,
A data analysis device further comprising a fourth analysis unit that performs tree analysis to display information related to the differences between the two data in a tree-like manner, and displays the differences between the two data by focusing on specific feature quantities.

The data analysis device according to claim 12 ,
A data analysis device characterized in that it is configured to be able to derive and display a form analysis by the third analysis unit from the tree analysis by the fourth analysis unit.

The data analysis device according to claim 11,
The setting screen of the third analysis unit is provided with a column area and a row area for defining rows and columns of the form analysis, and a value area for defining numerical values to be displayed as the content of the form analysis,
A data analysis device characterized in that when a numeric attribute is placed in the value area, a total value of the placed attribute is calculated.

13. The data analysis apparatus according to claim 12,
the analysis screen of the fourth analysis unit is provided with a first designation area for designating a first analysis group, a second designation area for designating a second analysis group, and a tree display area;
a data analysis device characterized in that the tree display area further displays the aggregated results of the first analysis group specified in the first specification area and the second analysis group specified in the second specification area, and an analysis axis addition window for adding an analysis axis to the aggregated results.

In a data analysis method for analyzing data,
a data input step of inputting a plurality of tabular data having a plurality of feature quantities;
a data model setting step of accepting setting of relation information that defines relationships between feature quantities included in the plurality of tabular data inputted in the data input step and setting a data model to be analyzed;
a data adjustment step of adjusting the data model set in the data model setting step based on analysis setting information;
a first analysis step of performing a first analysis on the data model set in the data model setting step to generate a first analysis result;
a second analysis step of performing a second analysis on the data model adjusted in the data adjustment step to generate a second analysis result;
an output step of adding new feature quantities included in the analysis results of at least one of the first analysis step and the second analysis step to a data model to be analyzed next time,
In the first analysis step,
Accepting a target variable and a learning reference date specified by a user as the analysis setting information;
extracting features highly related to the specified objective variable, and performing a factor analysis to extract segments in which the average value of the objective variable is relatively high or low compared to the average value of the objective variable of all data, by aggregating the features based on data aggregated for a period before a learning reference date set by a user as the analysis setting information, and aggregating the objective variable based on data aggregated for a period after the learning reference date;
In the output step, based on the result of the factor analysis executed in the first analysis step, a feature quantity having a high degree of association with the objective variable is added to a data model to be analyzed next time;
In the second analysis step,
Accepting a setting of a prediction reference date different from the learning reference date as the analysis setting information;
If the data model to be predicted includes a feature having an aggregation period as a parameter, automatically recalculating the value of each feature calculated in the first analysis step based on the prediction reference date and the aggregation period;
a data analysis method comprising: executing a predictive analysis to predict a value of the dependent variable for each data item to be predicted, based on a data model to which the recalculated feature quantities are added and which are extracted as feature quantities highly associated with the dependent variable by the factor analysis executed in the first analysis step.