JP5965267B2

JP5965267B2 - Information processing apparatus and information processing method

Info

Publication number: JP5965267B2
Application number: JP2012213919A
Authority: JP
Inventors: 全徳冨山; 夏樹石田
Original assignee: 株式会社日立ソリューションズ西日本
Priority date: 2012-09-27
Filing date: 2012-09-27
Publication date: 2016-08-03
Anticipated expiration: 2032-09-27
Also published as: JP2014067361A

Description

本発明は、情報処理技術に関し、異なる領域の度数分布に基づいて度数を推定するための情報処理技術に関し、例えば、地域情報から移住者の特徴を推定する分析活動を支援するための情報を作成する技術に関するものである。 The present invention relates to information processing technology, and relates to information processing technology for estimating frequency based on frequency distribution of different areas, for example, creating information for supporting analysis activities for estimating migrants' characteristics from regional information It is related to the technology.

度数分布表は、学術的、実用的に広く用いられている。
例えば、企業の営業活動において顧客との取引関係を深め、顧客ロイヤルティを高めるために、地域情報から地域の持つ独自性、異質性を見極めることを目的として各地域の特徴をつかむ分析作業が求められている。このため、地域別に集計した人口統計や金融関連情報、消費購買力情報から地域の特徴をつかむ上で有用な情報を抽出する分析技術が求められている。 The frequency distribution table is widely used academically and practically.
For example, in order to deepen business relationships with customers and increase customer loyalty in corporate sales activities, analysis work that grasps the characteristics of each region for the purpose of determining the uniqueness and heterogeneity of the region from regional information is required. It has been. For this reason, there is a need for an analysis technique that extracts information useful for grasping regional characteristics from demographics, financial information, and consumer purchasing power information that are tabulated by region.

従来から、地域の情報を利用した営業活動支援を目的とした分析手法は存在している。例えば、下記特許文献１では商品の購入者が居住する地域の情報にもとづいて販売実績情報を分析することにより、商品の販売者にとって有用な情報を生成するものである。 Conventionally, there is an analysis method aiming at sales activity support using local information. For example, in Patent Document 1 described below, information useful for a merchandise seller is generated by analyzing sales performance information based on information on a region where the purchaser of the merchandise resides.

特開２００９−１６９６９９号公報JP 2009-169699 A

ところで、一般的に地域情報をもとに分析を行う際には地域の細分化を行う必要がある。これは、市場を細分化して小さく見ることでその細やかな変化に目を凝らし、その地域のニーズを見出すためである。このため、地域情報をもとに分析を行う際には都道府県・市区町村といった大きな括りではなく、町丁別といった小さな単位にまで細分化することが有効である。 By the way, in general, when performing analysis based on regional information, it is necessary to subdivide the region. This is to find out the needs of the region by focusing on the subtle changes by segmenting the market into smaller ones. For this reason, when performing analysis based on regional information, it is effective to subdivide into small units such as towns and villages rather than large groups such as prefectures and municipalities.

しかしながら、地域情報を収集したデータベースには市区町村単位までしか詳細な情報がなく、町丁別単位では詳細な情報が欠落している状態のものがある。 However, there is a database in which regional information is collected that contains detailed information only up to the municipality unit, and detailed information is missing in the town-by-town unit.

このため、地域情報の分析が市区町村といった上位区画地域単位のみで行われ、町丁別といった下位区画地域単位の市場の変化を見出すことができず適切な判断・改善行動をとることができない状態になってしまう。 For this reason, regional information is analyzed only in the upper-level area units such as municipalities, and changes in the market in the lower-level area units such as town-by-town cannot be found and appropriate judgment and improvement actions cannot be taken. It becomes a state.

それ以外にも、実験などにより求められた特性を階調として、各階調毎の度数を求めて度数分布とした際に、異なる測定領域間で度数分布のデータを補間できれば、有用である。 In addition, it is useful if the frequency distribution data can be interpolated between different measurement areas when the frequency obtained by experiments is used as the gray level and the frequency for each gray level is obtained to obtain the frequency distribution.

本発明は、大きな領域とそれとは異なる小さな領域等との間の度数分布の関連を求め、度数分布を補間する情報処理を行うことを目的とする。 An object of the present invention is to perform an information process for interpolating the frequency distribution by obtaining the relationship of the frequency distribution between a large area and a small area different from the large area.

本発明の一観点によれば、第１の領域における第１の度数分布の第１の代表値と、前記第１領域とは異なる第２の領域の第２の度数分布の第２の代表値とを比較する代表値比較部と、前記代表値比較部において、第１の代表値と第２の代表値との一致度が低い場合に、前記第１の度数分布と前記第２の度数分布との分布の偏りを少なくする方向に第２の度数分布を変換する分布変換部と、前記分布変換部で変換された変換後の前記第２の度数分布を基に新たな第２の代表値を求め、前記新たな第２の代表値と前記第１の代表値とを前記代表値比較部で比較し、前記第１の代表値と前記新たな第２の代表値との一致度が高くなるまで前記第２の代表値を求める処理を継続する分布推定部とを有することを特徴とする情報処理装置が提供される。 According to an aspect of the present invention, the first representative value of the first frequency distribution in the first region and the second representative value of the second frequency distribution in the second region different from the first region. In the representative value comparison unit and the representative value comparison unit, the first frequency distribution and the second frequency distribution when the degree of coincidence between the first representative value and the second representative value is low. And a new second representative value based on the second frequency distribution converted by the distribution conversion unit and a distribution conversion unit that converts the second frequency distribution in a direction that reduces the distribution bias The new second representative value is compared with the first representative value by the representative value comparison unit, and the degree of coincidence between the first representative value and the new second representative value is high. There is provided an information processing apparatus comprising: a distribution estimation unit that continues the process of obtaining the second representative value until

前記分布推定部で推定された第２の代表値と度数分布とに基づいて、前記第２の度数分布の度数を算出する度数分布算出部を有することを特徴とする。 It has a frequency distribution calculation unit that calculates the frequency of the second frequency distribution based on the second representative value and the frequency distribution estimated by the distribution estimation unit.

また、本発明は、地域情報データベースを備えた情報処理装置であって、前記地域情報データベースに予め格納されている上位区画の地域とその下位区画にあたる地域の２つの平均値の比較処理を行う比較部と、その比較結果をもとに上位区画の地域の度数分布から下位区画の地域の度数分布に近づける変換方法を求める変換部と、前記変換方法を用いて上位区画の地域の度数分布から下位区画の度数分布を推定する処理を行う推定部と、推定した度数分布から母集団の平均値を求めるためのブートストラップ処理を行うブートストラップ処理部と、を備えることを特徴とする情報処理装置である。これにより、下位区画の地域の度数分布の推定を精度良く行うことができる。 In addition, the present invention is an information processing apparatus provided with a regional information database, which compares two average values of a region in a higher section and a region in a lower section that are stored in advance in the region information database. And a conversion unit for obtaining a conversion method for approximating the frequency distribution of the upper section region to the frequency distribution of the lower section region based on the comparison result, and the lower order from the frequency distribution of the upper section region using the conversion method An information processing apparatus comprising: an estimation unit that performs processing for estimating a frequency distribution of a partition; and a bootstrap processing unit that performs bootstrap processing for obtaining an average value of a population from the estimated frequency distribution is there. Thereby, it is possible to accurately estimate the frequency distribution of the area of the lower section.

本発明の他の観点によれば、第１の領域における第１の度数分布の第１の代表値と、前記第１領域とは異なる第２の領域の第２の度数分布の第２の代表値とを比較する代表値比較ステップと、前記代表値比較ステップにおいて、第１の代表値と第２の代表値との一致度が低い場合には、前記第１の度数分布と前記第２の度数分布との分布の偏りを少なくする方向に第２の度数分布を変換する分布変換ステップと、前記分布変換ステップで変換された変換後の前記第２の度数分布を基に新たな第２の代表値を求め、前記新たな第２の代表値と前記第１の代表値とを前記代表値比較部で比較し、前記第１の代表値と前記新たな第２の代表値との一致度が高くなるまで前記第２の代表値を求める処理を継続する分布推定ステップと、を有することを特徴とする情報処理方法が提供される。 According to another aspect of the present invention, the first representative value of the first frequency distribution in the first region and the second representative of the second frequency distribution in the second region different from the first region. In the representative value comparison step for comparing values and the representative value comparison step, when the degree of coincidence between the first representative value and the second representative value is low, the first frequency distribution and the second value distribution A distribution conversion step of converting the second frequency distribution in a direction to reduce the deviation of the distribution from the frequency distribution, and a new second frequency distribution based on the converted second frequency distribution converted by the distribution conversion step. A representative value is obtained, the new second representative value and the first representative value are compared by the representative value comparison unit, and the degree of coincidence between the first representative value and the new second representative value And a distribution estimation step of continuing the process of obtaining the second representative value until the value becomes high. The information processing method according to symptoms is provided.

本発明は、上記に記載の情報処理方法をコンピュータに実行させるためのプログラムであっても良く、当該プログラムを記録するコンピュータ読み取り可能な記録媒体であっても良い。 The present invention may be a program for causing a computer to execute the information processing method described above, or a computer-readable recording medium for recording the program.

本発明によれば、次のような効果がある。
異なる領域の属性値情報に欠落情報部分が存在しても、一方の度数分布、属性平均値、他方の属性平均値をもとに欠落情報部分を推定して利用者に提供することで、特徴分析を推定することができる。 The present invention has the following effects.
Even if missing information parts exist in attribute value information in different areas, the missing information part is estimated based on one frequency distribution, attribute average value, and other attribute average value, and provided to the user. Analysis can be estimated.

本発明の一実施の形態による情報処理装置を含むシステム構成図である。1 is a system configuration diagram including an information processing apparatus according to an embodiment of the present invention. 本実施の形態による情報処理に用いられる度数分布の一例を示す図である。It is a figure which shows an example of the frequency distribution used for the information processing by this Embodiment. 本実施の形態による情報処理の流れを示すフローチャート図である。ホストシステム内で行われる情報処理の概要を示すフローチャートである。It is a flowchart figure which shows the flow of the information processing by this Embodiment. It is a flowchart which shows the outline | summary of the information processing performed within a host system. 図４（ａ）は、上位区画地域にあたるＡ県Ｂ市といった市区町村単位による預貯金額情報を集計したデータを示している。図４（ｂ）は、Ａ県Ｂ市の下位区画地域にあたるＡ県Ｂ市のＣ町といった町丁単位による預貯金額情報を集計したデータを示している。FIG. 4 (a) shows data obtained by summing up deposit / saving amount information in units of municipalities such as A city and B city corresponding to the upper division area. FIG. 4 (b) shows data obtained by tabulating deposit and saving amount information in units of towns, such as C town of A prefecture B city, which is a subdivision area of A prefecture B city. ホストシステム１３における図４（ｂ）の預貯金額別の該当人口数の推定を例とした処理の概要を示すフローチャート図である。FIG. 5 is a flowchart showing an outline of processing in the host system 13 as an example of estimating the number of populations according to deposit and saving amounts in FIG. Ｅ（ａ）とＥ（ｂ）の比較結果による変換式ｆ（ｘ）の違いを示したイメージ図である。It is the image figure which showed the difference of the conversion formula f (x) by the comparison result of E (a) and E (b). 算出されたＡ県Ｂ市Ｃ町の預貯金額別の人口分布ｈ（ｂ）から欠落情報部分の推定を行う処理のイメージ図である。It is an image figure of the process which estimates a missing information part from the population distribution h (b) according to the deposit and saving amount of the calculated A prefecture B city C town. 本実施の形態による情報処理に用いられる度数分布の例を示す図であり、対象は半導体基板上に多数形成されたＬＥＤであり、階級が発光波長帯であり、度数が半導体基板上の該当する発光波長帯を有するＬＥＤの個数である。It is a figure which shows the example of the frequency distribution used for the information processing by this Embodiment, A target is LED formed in large numbers on a semiconductor substrate, a class is a light emission wavelength band, and frequency corresponds to a semiconductor substrate. It is the number of LEDs having an emission wavelength band.

以下、本発明の実施の形態による情報処理技術について図面に参照しながら詳細に説明する。
図１は、本発明の一実施の形態による情報処理装置を含むシステム構成図である。本実施の形態による情報システムは、地域情報などの特性（ある特性値がどの程度の度数存在するか）に関する度数分布（確率分布）情報データベース、ここでは例として図４で後述するように地域情報データベース１１を備えたパーソナルコンピュータなどのホストシステム１３を設け、このホストシステム１３と利用者との情報のやり取りを行うパーソナルコンピュータやスマートフォンなどの情報入出力装置１２が接続されている。ホストシステム１３には、地域情報データベースなどの度数分布情報データベースに格納された第１の領域、例えば上位区画の地域と第２の領域、例えばその下位区画の地域の２つの平均値などの代表値（平均値、中央値など）の比較処理を行う代表値比較部１３１と、第１の領域（第１群）である上位区画の地域の度数分布から第２の領域で（第２群）ある下位区画の地域の度数分布を推定するための分布変更値、例えば変換式を算出する分布変更値算出部（変換式算出部）１３２を設け、変換式などを用いて第１の領域である上位区画の地域の度数分布を変換処理し第２の領域である下位区画の地域の度数分布の推定を行う分布変換部（変換処理部）１３３を備え、推定された第２の領域である下位区画の地域の度数分布から第２の領域である下位区画の地域全体の平均値を算出するための分布推定処理（ブートストラップ処理）を行う分布推定部（ブートストラップ処理部）１３４を設け、推定された第２の領域である下位区画の地域の度数分布から第２の領域である下位区画の地域の不明な属性値などの度数分布を算出する度数分布（不明値）算出部１３５を備えている。 Hereinafter, an information processing technique according to an embodiment of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a system configuration diagram including an information processing apparatus according to an embodiment of the present invention. The information system according to the present embodiment is a frequency distribution (probability distribution) information database relating to characteristics of regional information and the like (how much frequency a certain characteristic value exists), here as an example, as described later in FIG. A host system 13 such as a personal computer provided with a database 11 is provided, and an information input / output device 12 such as a personal computer or a smartphone for exchanging information between the host system 13 and a user is connected. The host system 13 includes representative values such as two average values of the first area stored in the frequency distribution information database such as the area information database, for example, the upper section area and the second area, for example, the lower section area. The representative value comparison unit 131 that performs comparison processing (average value, median value, etc.) and the frequency distribution in the upper section area that is the first area (first group) are the second area (second group). A distribution change value for estimating the frequency distribution in the lower section area, for example, a distribution change value calculation unit (conversion equation calculation unit) 132 for calculating a conversion formula is provided, and the upper region that is the first region using the conversion formula or the like. A distribution conversion unit (conversion processing unit) 133 that performs conversion processing of the frequency distribution of the area of the partition and estimates the frequency distribution of the region of the lower section that is the second area, and the estimated lower subsection that is the second area The second region from the frequency distribution of the region A distribution estimation unit (bootstrap processing unit) 134 that performs a distribution estimation process (bootstrap process) for calculating an average value of the entire area of a certain sub-compartment is provided, and the sub-compartment area that is the estimated second region A frequency distribution (unknown value) calculation unit 135 that calculates a frequency distribution such as an unknown attribute value of the sub-region of the second section, which is the second area.

図２は、本実施の形態による情報処理に用いられる度数分布の一例を示す図である。図２（ａ）は、１群の度数分布表であり、階級とそれに属する度数との関係と、合計度数と、代表値（ここでは平均値）とが示されている。図２（ｂ）は、２群の度数分布表であり、階級とそれに属する度数との関係と、合計度数と、代表値（ここでは平均値）とが示されている。一例としては、図８に示すように、１群（図８(ａ)）は、例えば、半導体基板（１ウエハ）上に多数形成されたＬＥＤであり、階級が発光波長帯であり、度数が半導体基板上のその発光波長帯を有するＬＥＤの個数である。２群（図８(ｂ)）は、例えば、半導体基板上に多数形成されたＬＥＤのうちある領域（例えば中央の領域）であり、階級が発光波長帯であり、度数が半導体基板上の中央の領域の発光波長帯を有するＬＥＤの個数である。階級と度数とは、度数分布を作成可能なあらゆる特性等に適用することができる。 FIG. 2 is a diagram illustrating an example of a frequency distribution used for information processing according to the present embodiment. FIG. 2A is a frequency distribution table of one group, and shows the relationship between the class and the frequency belonging to it, the total frequency, and the representative value (here, the average value). FIG. 2B is a frequency distribution table of two groups, showing the relationship between the class and the frequency belonging thereto, the total frequency, and the representative value (here, the average value). As an example, as shown in FIG. 8, one group (FIG. 8A) is, for example, a large number of LEDs formed on a semiconductor substrate (one wafer), the class is the emission wavelength band, and the frequency is The number of LEDs having the emission wavelength band on the semiconductor substrate. The second group (FIG. 8B) is, for example, a certain region (for example, the central region) of the LEDs formed on the semiconductor substrate, the class is the emission wavelength band, and the frequency is the center on the semiconductor substrate. This is the number of LEDs having a light emission wavelength band in the region. The class and the frequency can be applied to all characteristics that can create a frequency distribution.

図３は、本実施の形態による情報処理の流れを示すフローチャート図である。まず、図２（ａ）、図２（ｂ）に示すデータのうち、２群のｙ１’からｙ４’までのデータが未知であり、合計度数と代表値（平均値）は既知であるとする。これらのデータが、度数分布情報データベース１１内に格納されているとする。 FIG. 3 is a flowchart showing a flow of information processing according to the present embodiment. First, of the data shown in FIG. 2A and FIG. 2B, the data from y1 ′ to y4 ′ of the two groups is unknown, and the total frequency and the representative value (average value) are known. . It is assumed that these data are stored in the frequency distribution information database 11.

図３に示すように、処理が開始されると（ステップＳ１）、ステップＳ２で、１群の度数分布ｇ（ａ）と１群２群の代表値Ｅ（ａ）、Ｅ（ｂ）とを度数分布情報データベース１１から取得してホストシステム１３に入力する。ステップＳ３で、代表値比較部１３１が、Ｅ（ａ）、Ｅ（ｂ）を比較し、Ｅ（ａ）が（ｂ）とほぼ等しい場合には（ステップＳ３でＹＥＳ）、度数分布（不明値）算出部１３５が、ステップＳ４で、第１群の度数分布を第２群の度数分布にそのまま適用することで、第２群の欠落情報を推定し、処理を終了する（ステップＳ６）。ステップＳ３でＮｏの場合には、ステップＳ７に進み、代表値比較部１３１が、Ｅ（ａ）＞Ｅ（ｂ）であるか否かを判定する。YESの場合には、ステップＳ８に進み、分布変換部１３３が、分布変更値算出部１３２が算出した変更値により、第２群の度数分布を低諧調側に変換する。そして、ステップＳ９において、変換した度数分布を基に代表値Ｅ’（ａ）を求める。ステップＳ７でＮｏの場合には、ステップＳ１０に進み、第２群の度数分布を高諧調側に変換する。そして、ステップＳ９において、変換した度数分布を基に代表値Ｅ’（ａ）を求める。 As shown in FIG. 3, when the process is started (step S1), the frequency distribution g (a) of the first group and the representative values E (a) and E (b) of the first group and the second group are obtained in step S2. Obtained from the frequency distribution information database 11 and input to the host system 13. In step S3, the representative value comparison unit 131 compares E (a) and E (b). If E (a) is substantially equal to (b) (YES in step S3), the frequency distribution (unknown value) The calculation unit 135 estimates the missing information of the second group by applying the frequency distribution of the first group to the frequency distribution of the second group as it is in step S4, and ends the processing (step S6). In the case of No in step S3, the process proceeds to step S7, and the representative value comparison unit 131 determines whether or not E (a)> E (b). In the case of YES, the process proceeds to step S8, and the distribution conversion unit 133 converts the frequency distribution of the second group to the low gradation side by the change value calculated by the distribution change value calculation unit 132. In step S9, a representative value E '(a) is obtained based on the converted frequency distribution. In the case of No in step S7, the process proceeds to step S10, and the frequency distribution of the second group is converted to the high gradation side. In step S9, a representative value E '(a) is obtained based on the converted frequency distribution.

次いで、分布推定部１３４により処理が繰り返され、ステップＳ１１において、Ｅ（ａ）をＥ’（ａ）、ｇ（ａ）をg’(ａ)に置き換えて、ステップＳ３に戻る。ステップＳ３でＥ（ａ）が（ｂ）とほぼ等しいと代表値比較部１３１が判定すると、ステップＳ４からステップＳ５に進み、度数分布算出部１３５が最終的な度数分布を求め、処理が終了する。 Next, the processing is repeated by the distribution estimation unit 134. In step S11, E (a) is replaced with E '(a) and g (a) is replaced with g' (a), and the process returns to step S3. When the representative value comparison unit 131 determines that E (a) is substantially equal to (b) in step S3, the process proceeds from step S4 to step S5, where the frequency distribution calculation unit 135 obtains a final frequency distribution, and the process ends. .

以上のように、第１群の度数分布と、第１、第２群の代表値から、第２群の欠落情報、すなわち、ｙ_１’からｙ_４’を精度よく推測することができる。図８で示した例では、平均値が３３０ｎｍから３６０ｎｍになるため、度数分布が１群から２群になると、長波長側に分布がずれる方向に変換される。 As described above, missing information of the second group, that is, y ₁ ′ to y ₄ ′ can be accurately estimated from the frequency distribution of the first group and the representative values of the first and second groups. In the example shown in FIG. 8, since the average value is changed from 330 nm to 360 nm, when the frequency distribution is changed from the first group to the second group, it is converted in a direction in which the distribution is shifted to the long wavelength side.

本実施の形態によれば、異なる領域の属性値情報に欠落情報部分が存在しても、一方の度数分布、属性平均値、他方の属性平均値をもとに欠落情報部分を推定して利用者に提供することで、特徴分析を推定することができる。 According to the present embodiment, even if a missing information part exists in attribute value information in different areas, the missing information part is estimated and used based on one frequency distribution, the attribute average value, and the other attribute average value. The feature analysis can be estimated.

次に、より具体的な例について説明する。図４（ａ）は、上位区画地域にあたるＡ県Ｂ市といった市区町村単位による預貯金額情報を集計したデータを示している。図４（ｂ）は、Ａ県Ｂ市の下位区画地域にあたるＡ県Ｂ市のＣ町といった町丁単位による預貯金額情報を集計したデータを示している。図４（ｂ）における「―」は該当データがないこと、すなわち欠落情報があることを示しており、つまりＡ県Ｂ市Ｃ町といった下位の町丁単位では預貯金額別に該当人口数が集計されておらず、地域の人口合計数と預貯金額の平均値のみしか集計されていないことを示している。このように、下位区画地域において集計されていない情報部分を欠落情報として取り扱う。以下では、この図４（ｂ）の欠落情報であるＡ県Ｂ市Ｃ町の預貯金額別の該当人口数の推定を目的とした処理手順について説明する。 Next, a more specific example will be described. FIG. 4 (a) shows data obtained by summing up deposit / saving amount information in units of municipalities such as A city and B city corresponding to the upper division area. FIG. 4 (b) shows data obtained by tabulating deposit and saving amount information in units of towns, such as C town of A prefecture B city, which is a subdivision area of A prefecture B city. “-” In FIG. 4 (b) indicates that there is no corresponding data, that is, there is missing information. In other words, the number of corresponding populations is counted by deposit and saving amount in the subordinate towns such as A city, B city and C town. It shows that only the total number of populations in the region and the average value of deposits and savings are counted. In this way, information portions that are not tabulated in the lower-partition area are handled as missing information. Below, the process procedure aiming at the estimation of the number of population according to the deposit and savings amount of A prefecture B city C town which is missing information of this FIG.4 (b) is demonstrated.

図５は、ホストシステム１３における図４（ｂ）の預貯金額別の該当人口数の推定を例とした処理の概要を示すフローチャート図である。ホストシステム１３における処理の初期設定として、まずｇ（ａ）にＡ県Ｂ市単位で集計した預貯金額別の人口分布を、ｈ（ｂ）を推定の対象となるＡ県Ｂ市Ｃ町単位で集計した預貯金額別の人口分布として設定する（ステップ３０１）。次に、ｇ（ａ）とｈ（ｂ）の関係性を判断するためにＡ県Ｂ市の預貯金額の代表値の一例である平均値Ｅ（ａ）とＡ県Ｂ市Ｃ町の預貯金額の代表値の一例である平均値Ｅ（ｂ）の比較を行う（ステップ３０２、ステップ３０３）。預貯金額の平均値の比較結果がＥ（ａ）≒Ｅ（ｂ）の際には、Ａ県Ｂ市とＡ県Ｂ市Ｃ町の預貯金額別の人口分布の形状がほぼ同じと判断してｈ（ｂ）＝ｇ（ａ）と見なして次のステップ３１０に進む（ステップ３０４）。預貯金額の平均値の比較結果がＥ（ａ）＞Ｅ（ｂ）、Ｅ（ａ）＜Ｅ（ｂ）の際には、Ａ県Ｂ市の預貯金額別人口分布ｇ（ａ）を変換することでＡ県Ｂ市Ｃ町の預貯金額別人口分布ｈ（ｂ）の推定を行うために、変換式ｆ（ｘ）の生成を行う。変換式ｆ（ｘ）は、Ａ県Ｂ市の預貯金額別人口分布ｇ（ａ）とＡ県Ｂ市Ｃ町の預貯金額別人口分布ｈ（ｂ）の関係を示したものであり、ｈ（ｂ）＝ｇ（ａ）の際にはｆ（ｘ）＝ｘとなるものである。預貯金額の平均値の比較結果がＥ（ａ）＞Ｅ（ｂ）の際には、平均値が大きいＡ県Ｂ市の預貯金額別人口分布ｇ（ａ）を全体的にマイナス側に移動させることで平均値を下げてＥ（ａ）をＥ（ｂ）に近づける。 FIG. 5 is a flowchart showing an outline of processing in the host system 13 taking as an example the estimation of the number of corresponding populations by deposit and saving amount of FIG. As an initial setting of processing in the host system 13, first, g (a) is a population distribution according to deposit and saving amount totaled in units of A prefectures and B cities, and h (b) is estimated in units of A prefectures B cities and C towns. The aggregated population distribution by deposit and saving amount is set (step 301). Next, in order to determine the relationship between g (a) and h (b), the average value E (a), which is an example of the representative value of the savings amount of A prefecture B city, and the savings amount of A prefecture B city C town The average value E (b), which is an example of the representative value, is compared (step 302, step 303). When the comparison result of the average value of deposits and savings is E (a) ≒ E (b), it is judged that the shape of population distribution by deposits and savings in A prefecture B city and A prefecture B city C town is almost the same Considering h (b) = g (a), the process proceeds to the next step 310 (step 304). When the comparison result of the average value of the deposit and saving amount is E (a)> E (b), E (a) <E (b), the population distribution g (a) according to the deposit and saving amount of the prefecture A city B is converted. Thus, in order to estimate the population distribution h (b) according to the deposit and saving amount of A city, B city, and C town, the conversion formula f (x) is generated. The conversion formula f (x) shows the relationship between the population distribution g (a) by deposit and saving amount of A prefecture B city and the population distribution h (b) by deposit and saving amount of B prefecture C city of A prefecture, and h ( When b) = g (a), f (x) = x. When the comparison result of the average value of deposits and savings is E (a)> E (b), the population distribution by depositing and savings amount g (a) of A city and B city having a large average value is moved to the negative side as a whole. As a result, the average value is lowered to bring E (a) closer to E (b).

このため、Ａ県Ｂ市の預貯金額別人口分布ｇ（ａ）からＡ県Ｂ市Ｃ町の預貯金額別人口分布ｈ（ｂ）に変換処理で利用する変換式ｆ（ｘ）はｆ（ｘ）＝ｘに比べて上に凸となる。これをもとに、変換式ｆ（ｘ）を生成する（ステップ３０５）。預貯金額の平均値の比較結果がＥ（ａ）＜Ｅ（ｂ）の際には、平均値が小さいＡ県Ｂ市の預貯金額別人口分布ｇ（ａ）を全体的にプラス側に移動させることで平均値を上げてＥ（ａ）をＥ（ｂ）に近づける。このため、Ａ県Ｂ市の預貯金額別人口分布ｇ（ａ）からＡ県Ｂ市Ｃ町の預貯金額別人口分布ｈ（ｂ）に変換処理で利用する変換式ｆ（ｘ）はｆ（ｘ）＝ｘに比べて下に凸となる。これをもとに、変換式ｆ（ｘ）を生成する（ステップ３０６）。 For this reason, the conversion formula f (x) used in the conversion process from the population distribution g (a) by deposit and saving amount of A prefecture B city to the population distribution h (b) by deposit and saving amount of A prefecture B city C town is f (x ) = Protruded upward compared to x. Based on this, a conversion formula f (x) is generated (step 305). When the comparison result of the average value of deposits and savings is E (a) <E (b), the population distribution by depositing and savings amount g (a) of A city and B city with a small average value is moved to the plus side as a whole. As a result, the average value is raised and E (a) is brought close to E (b). For this reason, the conversion formula f (x) used in the conversion process from the population distribution g (a) by deposit and saving amount of A prefecture B city to the population distribution h (b) by deposit and saving amount of A prefecture B city C town is f (x ) = Lower than x. Based on this, a conversion formula f (x) is generated (step 306).

図６は、Ｅ（ａ）とＥ（ｂ）の比較結果による変換式ｆ（ｘ）の違いを示したイメージ図である。上図（４２）がＥ（ａ）＞Ｅ（ｂ）の際の上に凸の変換式ｆ（ｘ）のイメージであり、下図（４３）がＥ（ａ）＜Ｅ（ｂ）の際の下に凸の変換式ｆ（ｘ）のイメージである。次に、生成した変換式ｆ（ｘ）を利用してＡ県Ｂ市の預貯金額別の人口分布ｇ（ａ）をもとにしてＡ県Ｂ市Ｃ町の預貯金額別の推定人口分布ｇ’（ａ）を求める（ステップ３０７）。次に、求めたｇ’（ａ）から母集団であるＡ県Ｂ市Ｃ町の預貯金額の平均値Ｅ’（ａ）を求めるためにブートストラップ法を利用する（ステップ３０８）。次に、Ａ県Ｂ市Ｃ町の預貯金額別の推定人口分布ｇ’（ａ）をｇ（ａ）に、Ａ県Ｂ市Ｃ町の預貯金額の平均値Ｅ’（ａ）をＥ（ａ）に置き換え、再び数値比較部を通して実際のＡ県Ｂ市Ｃ町の預貯金額の平均値Ｅ（ａ）を比較することで評価を行う（ステップ３０２）。以上の処理を繰り返して、Ｅ（ａ）≒Ｅ（ｂ）を導きＡ県Ｂ市Ｃ町の預貯金額別の人口分布ｈ（ｂ）の推定が終了したら（ステップ３０２でＹＥＳ）、ステップ３０４を介してＡ県Ｂ市Ｃ町の預貯金額別の人口分布ｈ（ｂ）の推定を行うことでＡ県Ｂ市Ｃ町の預貯金額別人口の欠落情報部分の推定を行い（ステップ３１０）、処理を終了する。 FIG. 6 is an image diagram showing the difference in the conversion formula f (x) based on the comparison result between E (a) and E (b). The upper diagram (42) is an image of the conversion equation f (x) convex upward when E (a)> E (b), and the lower diagram (43) is when E (a) <E (b). It is an image of the conversion formula f (x) convex downward. Next, using the generated conversion formula f (x), based on the population distribution g (a) according to the deposit and saving amount of A prefecture B city, the estimated population distribution g according to the deposit and saving amount of A prefecture B city C town '(A) is obtained (step 307). Next, the bootstrap method is used to determine the average value E ′ (a) of the deposit amount of the B city, C city, A prefecture, which is the population from the determined g ′ (a) (step 308). Next, the estimated population distribution g ′ (a) of the A prefecture B city C town by deposit and saving amount is g (a), and the average value E ′ (a) of the deposit amount of A prefecture B city C town is E (a). Then, the evaluation is performed again by comparing the average value E (a) of the deposit and saving amounts of the A city, B city, and C town through the numerical value comparison unit (step 302). When the above processing is repeated to derive E (a) ≈E (b) and the estimation of the population distribution h (b) for each deposit and saving amount in A city, B city and C town is completed (YES in step 302), step 304 is executed. Estimating the population distribution h (b) by deposit / save amount of A prefecture B city / C town through the estimation of the missing information part of the population by deposit / save amount of A prefecture B city / C town (step 310) Exit.

図７は、算出されたＡ県Ｂ市Ｃ町の預貯金額別の人口分布ｈ（ｂ）から欠落情報部分の推定を行う処理のイメージ図である。初めにＡ県Ｂ市Ｃ町の預貯金額別の人口分布ｈ（ｂ）５１をもとにＡ県Ｂ市Ｃ町の預貯金額別に人口の推定比率５２を求める。次に、Ａ県Ｂ市Ｃ町の預貯金額別の人口推定比率５２にＡ県Ｂ市Ｃ町の合計人口５３を組合せることでＡ県Ｂ市Ｃ町の預貯金額別の推定人口５４を算出する。この情報処理を通して、欠落情報であったＡ県Ｂ市Ｃ町の預貯金額別の該当人口数を推定して利用者に提供を行う。 FIG. 7 is an image diagram of a process of estimating the missing information portion from the calculated population distribution h (b) of the prefecture A city B city C town according to the deposit and saving amount. First, based on the population distribution h (b) 51 for each deposit and saving amount in A prefecture, B city and C town, an estimated population ratio 52 is determined for each deposit and saving amount in A prefecture, B city and C town. Next, the estimated population 54 for each prefecture A city B city C town is calculated by combining the population estimate ratio 52 for each prefecture A city B city C town with the total population 53 of A prefecture B city C town. To do. Through this information processing, the number of corresponding populations by amount of deposit and saving in A prefecture, B city, and C town, which was missing information, is estimated and provided to the user.

本実施の形態によれば、下位区画地域の属性値情報に欠落情報部分が存在しても、上位区画地域の度数分布、属性平均値、下位区画地域の属性平均値をもとに欠落情報部分を推定して利用者に提供することで、地域情報を利用してより詳細な地域の特徴分析を行うことが可能となる。 According to the present embodiment, even if the missing information part exists in the attribute value information of the lower partition area, the missing information part based on the frequency distribution of the upper partition area, the attribute average value, and the attribute average value of the lower partition area By estimating and providing to the user, it is possible to perform more detailed regional feature analysis using regional information.

尚、上記の実施の形態において、添付図面に図示されている構成等については、これらに限定されるものではなく、本発明の効果を発揮する範囲内で適宜変更することが可能である。その他、本発明の目的の範囲を逸脱しない限りにおいて適宜変更して実施することが可能である。また、本発明の各構成要素は、任意に取捨選択することができ、取捨選択した構成を具備する発明も本発明に含まれるものである。 In the above-described embodiment, the configuration and the like illustrated in the accompanying drawings are not limited to these, and can be changed as appropriate within the scope of the effects of the present invention. In addition, various modifications can be made without departing from the scope of the object of the present invention. Each component of the present invention can be arbitrarily selected, and an invention having a selected configuration is also included in the present invention.

本発明は、情報処理装置として利用可能である。 The present invention can be used as an information processing apparatus.

１１…地域情報データベース
１２…入出力装置
１３…ホストシステム
１３１…代表値比較部
１３２…変換式算出部
１３３…変換処理部
１３４…ブートストラップ処理部
１３５…不明値算出部
２１…Ａ県Ｂ市の預貯金情報データ例
２２…Ａ県Ｂ市Ｃ町の預貯金情報データ例
４１…ｆ（ｘ）＝ｘの図
４２…ｆ（ｘ）＝ｘを上に凸にした図
４３…ｆ（ｘ）＝ｘを下に凸にした図
５１…算出したＡ県Ｂ市Ｃ町の預貯金額別の人口分布
５２…Ａ県Ｂ市Ｃ町の預貯金額別人口推定比率
５３…Ａ県Ｂ市Ｃ町の合計人口
５４…Ａ県Ｂ市Ｃ町の預貯金額別の推定人口 11 ... Regional information database 12 ... Input / output device 13 ... Host system 131 ... Representative value comparison unit 132 ... Conversion formula calculation unit 133 ... Conversion processing unit 134 ... Bootstrap processing unit 135 ... Unknown value calculation unit 21 ... Deposit and saving information data example 22 ... Deposit and saving information data example 41 of A city, B city and C town ... Fig. 42 of f (x) = x ... Fig. 43 with f (x) = x convex upward ... f (x) = x Fig. 51 with convexity down ... Calculated population distribution by deposit and saving amount of A prefecture B city and C town 52 ... Estimated population ratio by deposit and saving amount of A prefecture B city and C town 53 ... Total population of A prefecture B city and C town 54… Estimated population by amount of deposit and saving in A city, B city and C town

Claims

A representative value comparison unit that compares the first representative value of the first frequency distribution in the first region with the second representative value of the second frequency distribution in the second region different from the first region. When,
In the representative value comparing unit, and a distribution converter for converting the pre-Symbol second frequency distribution in the direction of the small deviation of the distribution of the first frequency distribution and the second frequency distribution,
A new second representative value is obtained based on the converted second frequency distribution converted by the distribution conversion unit, and the new second representative value and the first representative value are obtained as the representative value. compared in comparison unit, having a distribution estimating unit to continue processing the first representative value and the degree of coincidence between the second representative value new said seek the second representative value higher due so An information processing apparatus characterized by the above.

2. The frequency distribution calculating unit according to claim 1, further comprising: a frequency distribution calculating unit configured to calculate a frequency of the second frequency distribution based on the second representative value and the frequency distribution estimated by the distribution estimating unit. Information processing device.

An information processing apparatus having a regional information database,
A comparison unit that performs a comparison process of two average values of a region in the upper section and a region in the lower section that are stored in advance in the region information database;
Based on the comparison result, a conversion unit for obtaining a conversion method that approximates the frequency distribution of the upper section area to the frequency distribution of the lower section area;
An estimation unit that performs processing for estimating the frequency distribution of the lower section from the frequency distribution of the upper section area using the conversion method;
An information processing apparatus comprising: a bootstrap processing unit that performs a bootstrap process for obtaining an average value of a population from an estimated frequency distribution.

A representative value comparison step of comparing the first representative value of the first frequency distribution in the first region with the second representative value of the second frequency distribution in the second region different from the first region. When,
In the representative value comparison step, the distribution converting step of converting the second frequency distribution before SL direction to reduce the deviation of the distribution of said first frequency distribution second frequency distribution,
A new second representative value is obtained based on the converted second frequency distribution converted in the distribution conversion step, and the new second representative value and the first representative value are used as the representative value. compared in comparison step, to have a distribution estimating step of continuing processing of said first representative value and the degree of coincidence between the second representative value new said seek the second representative value higher due so program for executing information processing how to computer characterized by.