JP5032286B2

JP5032286B2 - Filtering processing method, filtering processing program, and filtering apparatus

Info

Publication number: JP5032286B2
Application number: JP2007318833A
Authority: JP
Inventors: 広樹谷岡
Original assignee: 株式会社ジャストシステム
Priority date: 2007-12-10
Filing date: 2007-12-10
Publication date: 2012-09-26
Anticipated expiration: 2027-12-10
Also published as: JP2009140437A

Description

この発明は、処理対象データが所望のデータであるか否かを判定するフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置に関する。 The present invention relates to a filtering processing method, a filtering processing program, and a filtering device that determine whether or not processing target data is desired data.

従来より、ユーザが、所定の判定処理機能を持ったフィルタリング装置を利用すると、この利用結果をフィルタリング装置に学習させることによってフィルタリング機能を向上させるフィルタリング処理が広く提供されている。たとえば、フィルタリング機能の学習にベイジアンネットワークで用いられている学習方法を適用させたものがある。この学習方法では、学習対象のフィルタリング装置は、学習用の入力値として二値素性を必要とするため、連続値を所定の閾値によって離散化して入力値として与えられる。 2. Description of the Related Art Conventionally, when a user uses a filtering device having a predetermined determination processing function, filtering processing that improves the filtering function by causing the filtering device to learn the use result has been widely provided. For example, there is a method in which a learning method used in a Bayesian network is applied to learning of a filtering function. In this learning method, the learning target filtering device requires a binary feature as an input value for learning. Therefore, a continuous value is discretized by a predetermined threshold value and given as an input value.

具体的に説明すると、まず、離散化に用いる閾値を決定するために、あらかじめ適当な閾値をいくつか設定しておく。そして、設定した各閾値を利用して連続値を離散化することにより二値素性を抽出する。その後、閾値ごとに、抽出された二値素性の出力確率を算出する。この算出結果から各カテゴリへの分類にとって効果のない素性を排除する。このような処理によって、二値素性の数を絞り込むことができるため、ベイジアンネットワークの学習方法を実行する際の計算量を削減することができる（たとえば、下記特許文献１参照。）。 More specifically, first, in order to determine a threshold value used for discretization, several appropriate threshold values are set in advance. And a binary feature is extracted by discretizing a continuous value using each set threshold value. Thereafter, the output probability of the extracted binary feature is calculated for each threshold. Features that are ineffective for classification into each category are excluded from this calculation result. By such processing, the number of binary features can be narrowed down, so that the amount of calculation when executing the Bayesian network learning method can be reduced (see, for example, Patent Document 1 below).

特開２００４−３２６４６５号公報JP 2004-326465 A

一般的に、学習対象となるフィルタリング装置への入力値として二値素性を利用する場合、これらの二値素性の出力確率の分布を求めたとき、その分布が分散している値が、入力値として有用であるとされている。しかしながら、上記特許文献１に記載の技術を用いた場合、閾値ごとに算出された二値素性の出力確率の分布が偏ってしまうといった問題があった。 In general, when binary features are used as input values to the filtering device to be learned, when the distribution of output probabilities of these binary features is obtained, the values in which the distributions are dispersed are input values. As useful. However, when the technique described in Patent Document 1 is used, there is a problem that the distribution of output probability of binary features calculated for each threshold value is biased.

また、上述の学習方法に限らず、閾値を用意して離散化した値を利用する場合、どのような閾値を設定するかが処理内容に大きく影響する。したがって、閾値の設定には事前の試行錯誤が欠かせない。また、学習の際、フィルタリング装置の判定傾向が大きく変わってしまった場合には、閾値の設定も見直さなければならない。このように、従来の学習方法を適用させたフィルタリング装置の場合、閾値設定にかかる処理がユーザにとって大きな負担となるという問題があった。 In addition to the learning method described above, when a threshold is prepared and a discretized value is used, what kind of threshold is set greatly affects the processing content. Therefore, prior trial and error are indispensable for setting the threshold. In addition, when learning is performed, if the determination tendency of the filtering device has changed significantly, the setting of the threshold must also be reviewed. As described above, in the case of a filtering device to which a conventional learning method is applied, there is a problem that a process for setting a threshold is a heavy burden on the user.

この発明は、上述した従来技術による問題点を解消するため、設定処理の負担を軽減させ、かつ、効率的に処理精度を向上させるための学習をおこなう機能を備えたフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置を提供することを目的とする。 The present invention eliminates the problems caused by the prior art described above, reduces the burden of setting processing, and provides a filtering processing method and a filtering processing program having a function of performing learning for improving processing accuracy efficiently. And it aims at providing a filtering apparatus.

上述した課題を解決し、目的を達成するため、請求項１の発明にかかるフィルタリング処理方法は、処理対象データがユーザの所望するデータであるか否かを判定するフィルタリング処理方法であって、前記処理対象データを構成する要素を解析する解析工程と、前記解析工程によって解析された各要素の前記処理対象データに属する確率を算出する第１の算出工程と、前記解析工程によって解析された各要素を、前記第１の算出工程によって算出された確率に基づいて有効値／無効値に離散化する第１の離散化工程と、前記第１の離散化工程によって有効値／無効値に離散化された要素を含んだ各要素の前記処理対象データに属する確率を算出する第２の算出工程と、前記第２の算出工程によって算出された確率に基づいて前記各要素を有効値／無効値に離散化することにより、処理対象データが所望のデータであるか否かを判定する第２の離散化工程と、を含むことを特徴とする。 In order to solve the above-described problems and achieve the object, a filtering processing method according to the invention of claim 1 is a filtering processing method for determining whether or not processing target data is data desired by a user, An analysis step for analyzing elements constituting the processing target data, a first calculation step for calculating a probability that each element analyzed by the analysis step belongs to the processing target data, and each element analyzed by the analysis step the discretization enable value / disable value and the first discrete step of discretizing enable value / disable value based on probability calculated by the first calculation step, by said first discrete step Yes a second calculation step of calculating a probability, each of said elements based on the probability calculated by the second calculation step belonging to the processing target data of each element containing element that is By discretizing the value / disable value, data to be processed, characterized in that it comprises a second discrete step of determining whether the desired data.

この請求項１の発明によれば、処理対象データを構成する要素の出力確率に基づいて、各要素を有効値／無効値の二値素性に離散化する。この離散化結果を用いて、処理対象データがユーザの所望のデータであるかを判定する。すなわち、ユーザが閾値などのパラメータを用意しなくとも算出結果を利用して離散化をおこなうことができる。また、第１離散化工程による離散化結果は、後段の第２の離散化に反映されるため、精度の高い判定処理が可能となる。 According to the first aspect of the present invention, each element is discretized into binary features of valid / invalid values based on the output probabilities of the elements constituting the processing target data. Using this discretization result, it is determined whether the processing target data is user-desired data. That is, even if the user does not prepare a parameter such as a threshold, the calculation result can be used for discretization. In addition, since the discretization result by the first discretization step is reflected in the second discretization in the subsequent stage, highly accurate determination processing is possible.

また、請求項２の発明にかかるフィルタリング処理方法は、請求項１に記載の発明において、前記第２の離散化工程による判断の正誤を受け付ける受付工程と、前記受付工程によって受け付けた誤判定とされた処理対象データを構成する各要素に関して、前記第１の算出工程にて算出される確率を調整する調整工程と、を含むことを特徴とする。 Further, the filtering processing method according to the invention of claim 2 is the acceptance process of accepting the correctness / wrong of the judgment by the second discretization process and the error judgment accepted by the accepting process in the invention of claim 1. for each element constituting the processing object data, characterized in that it comprises a, an adjustment step of adjusting the probabilities that will be calculated by the first calculation step.

この請求項２の発明によれば、フィルタリング処理による判定結果が、ユーザの所望データと一致していなかった場合に、この誤判定の内容をフィードバックする。具体的には、第１の算出工程において算出される要素の出力確率が調整される。したがって、誤判定された処理判定データと同じ構成のデータのフィルタリング処理がおこなわれた場合には、当該データはユーザが所望するデータではないと判定するため、判定精度を向上させることができる。 According to the second aspect of the present invention, when the determination result by the filtering process does not match the user's desired data, the contents of the erroneous determination are fed back. Specifically, the output probability of the element calculated in the first calculation step is adjusted. Therefore, when the filtering process of data having the same configuration as the erroneously determined process determination data is performed, it is determined that the data is not the data desired by the user, so that the determination accuracy can be improved.

また、請求項３の発明にかかるフィルタリング処理方法は、請求項１または２に記載の発明において、前記第１の離散化工程および第２の離散化工程の少なくとも一方では、前記各要素を任意の関数に写像して得られた値を用いて離散化をおこなうことを特徴とする。 According to a third aspect of the present invention, there is provided a filtering processing method according to the first or second aspect of the present invention, wherein at least one of the first discretization step and the second discretization step, each element is arbitrarily set. Discretization is performed using values obtained by mapping to functions.

この請求項３の発明によれば、関数変換によって離散化対象の要素の出力確率分布の挙動が強調されるため離散化の調整が容易になる。 According to the third aspect of the invention, since the behavior of the output probability distribution of the element to be discretized is emphasized by the function conversion, the discretization can be easily adjusted.

また、請求項４の発明にかかるフィルタリング処理方法は、請求項１または２に記載の発明において、前記第１の離散化工程および第２の離散化工程の少なくとも一方では、前記各要素の確率と、あらかじめ設定した閾値との比較結果から有効値／無効値に離散化することを特徴とする。 Also, filtering processing method according to the invention of claim 4 is the invention according to claim 1 or 2, wherein at least one of the first discrete step and the second discrete step, probability of each element And the result of comparison with a preset threshold value is discretized into valid / invalid values.

この請求項４の発明によれば、最適な閾値が判別しているような場合には、この閾値を設定して離散化をおこなわせることができる。 According to the fourth aspect of the present invention, when an optimum threshold value is determined, the threshold value can be set and discretized.

また、請求項５の発明にかかるフィルタリング処理方法は、請求項１〜４のいずれか一つに記載の発明において、前記解析工程では、前記処理対象データが電子メールデータである場合、当該電子メールデータのヘッダと本文に対して解析をおこなうことを特徴とする。 A filtering processing method according to a fifth aspect of the present invention is the filtering method according to any one of the first to fourth aspects, wherein, in the analyzing step, when the processing target data is electronic mail data, the electronic mail It is characterized by analyzing the data header and body.

この請求項５の発明によれば、不特定多数のメールが送信された場合であっても、フィルタリング処理方法を利用して、ユーザの所望しないメールを排除することができる。 According to the fifth aspect of the present invention, even when an unspecified number of mails are transmitted, mails that are not desired by the user can be eliminated by using the filtering processing method.

また、請求項６の発明にかかるフィルタリング処理プログラムは、処理対象データがユーザの所望するデータであるか否かをコンピュータに判定させるフィルタリング処理プログラムであって、前記処理対象データを構成する要素を解析させる解析工程と、前記解析工程によって解析させた各要素の前記処理対象データに属する確率を算出させる第１の算出工程と、前記解析工程によって解析させた各要素を、前記第１の算出工程によって算出させた確率に基づいて有効値／無効値に離散化させる第１の離散化工程と、前記第１の離散化工程によって有効値／無効値に離散化させた要素を含んだ各要素の前記処理対象データに属する確率を算出させる第２の算出工程と、前記第２の算出工程によって算出された確率に基づいて前記各要素を有効値／無効値に離散化させることにより、処理対象データが所望のデータであるか否かを判定させる第２の離散化工程と、をコンピュータに実行させることを特徴とする。 According to a sixth aspect of the present invention, there is provided a filtering processing program for causing a computer to determine whether or not processing target data is data desired by a user, and analyzing elements constituting the processing target data. An analysis step to be performed, a first calculation step to calculate a probability belonging to the processing target data of each element analyzed by the analysis step, and each element analyzed by the analysis step by the first calculation step a first discrete step of discretizing enable value / disable value based on was calculated probabilities, each element containing elements that were discretized enable value / invalid value by the first discrete step Yes of a second calculation step of calculating the probability of belonging to the processed data, each of said elements based on the probability calculated by the second calculation step By discretizing the value / disable value, characterized in that to execute a second discrete step of processing target data to determine whether or not the desired data, to the computer.

この請求項６の発明によれば、処理対象データを構成する要素の出力確率に基づいて、各要素を有効値／無効値の二値素性に離散化する。この離散化結果を用いて、処理対象データがユーザの所望のデータであるかを判定する。すなわち、ユーザが閾値などのパラメータを用意しなくとも算出結果を利用して離散化をおこなうことができる。また、第１離散化工程による離散化結果は、後段の第２の離散化に反映されるため、精度の高い判定処理が可能となる。 According to the sixth aspect of the present invention, each element is discretized into binary features of valid / invalid values based on the output probabilities of the elements constituting the processing target data. Using this discretization result, it is determined whether the processing target data is user-desired data. That is, even if the user does not prepare a parameter such as a threshold, the calculation result can be used for discretization. In addition, since the discretization result by the first discretization step is reflected in the second discretization in the subsequent stage, highly accurate determination processing is possible.

また、請求項７の発明にかかるフィルタリング処理プログラムは、請求項６に記載の発明において、前記第２の離散化工程による判断の正誤を受け付ける受付工程と、前記受付工程によって受け付けた誤判定とされた処理対象データを構成する各要素に関して、前記第１の算出工程にて算出される確率を調整させる調整工程と、をコンピュータに実行させることを特徴とする。 According to a seventh aspect of the present invention, there is provided a filtering processing program according to the sixth aspect of the present invention, wherein the receiving step accepts correct / incorrect judgment by the second discretization step and the misjudgment accepted by the accepting step. for each element constituting the processing object data, characterized in that to perform an adjustment step of adjusting the probabilities that will be calculated by the first calculation step, to the computer.

この請求項７の発明によれば、フィルタリング処理による判定結果が、ユーザの所望データと一致していなかった場合に、この誤判定の内容をフィードバックする。具体的には、誤判定に含まれている各要素について、第１の算出工程によって算出される出力確率が調整される。したがって、誤判定された処理判定データと同じ構成のデータのフィルタリング処理がおこなわれた場合には、当該データはユーザが所望するデータではないと判定するため、判定精度を向上させることができる。 According to the seventh aspect of the present invention, when the determination result by the filtering process does not match the user's desired data, the contents of the erroneous determination are fed back. Specifically, the output probability calculated by the first calculation step is adjusted for each element included in the erroneous determination. Therefore, when the filtering process of data having the same configuration as the erroneously determined process determination data is performed, it is determined that the data is not the data desired by the user, so that the determination accuracy can be improved.

また、請求項８の発明にかかるフィルタリング装置は、処理対象データがユーザの所望するデータであるか否かを判定するフィルタリング装置であって、前記処理対象データを構成する要素を解析する解析手段と、前記解析手段によって解析された各要素の前記処理対象データに属する確率を算出する第１の算出手段と、前記解析手段によって解析された各要素を、前記第１の算出手段によって算出された確率に基づいて有効値／無効値に離散化する第１の離散化手段と、前記第１の離散化手段によって有効値／無効値に離散化された要素を含んだ各要素の前記処理対象データに属する確率を算出する第２の算出手段と、前記第２の算出手段によって算出された確率に基づいて前記各要素を有効値／無効値に離散化することにより、処理対象データが所望のデータであるか否かを判定する第２の離散化手段と、前記第２の離散化手段による判断の正誤を受け付ける受付手段と、前記受付手段によって受け付けた誤判定とされた処理対象データを構成する各要素に関して、前記第１の算出手段にて算出される確率を調整する調整手段と、を備えることを特徴とする。 The filtering device according to an eighth aspect of the present invention is a filtering device that determines whether or not the processing target data is data desired by the user, and an analysis unit that analyzes elements constituting the processing target data. a first calculating means for calculating the probability of belonging to the processing target data of each element is analyzed by the analyzing means, each element that has been analyzed by said analyzing means, calculated by the first calculating means sure a first discretizing means for discretizing enable value / disable value based on the rate, wherein the process target of each element including the discretized elements enable value / invalid value by the first discretizing means a second calculation means for calculating the probability of belonging to the data, by discretizing enable value / invalid value the respective elements based on the probability calculated by the second calculating means, processed The second discretization means for determining whether the data is the desired data, the accepting means for accepting the correctness / incorrectness of the judgment by the second discretization means, and the misjudgment accepted by the accepting means for each element constituting the processing object data, characterized by comprising a an adjustment means for adjusting the probability that will be calculated by the first calculating means.

この請求項８の発明によれば、処理対象データを構成する要素の出力確率に基づいて、各要素を有効値／無効値の二値素性に離散化する。この離散化結果を用いて、さらに出力確率が算出され、この算出結果を用いて再度離散化をおこなうことによって、処理対象データがユーザの所望のデータであるかを判定する。さらに、判定結果は、以後の離散化にフィードバックされる。すなわち、ユーザが閾値などのパラメータを用意しなくとも算出結果を利用して離散化をおこなうとともに、判定精度の向上も可能となる。 According to the eighth aspect of the invention, each element is discretized into a binary feature of an effective value / invalid value based on the output probability of the element constituting the processing target data. An output probability is further calculated using the discretization result, and discretization is performed again using the calculation result, thereby determining whether the processing target data is data desired by the user. Further, the determination result is fed back to the subsequent discretization. That is, even if the user does not prepare a parameter such as a threshold value, the calculation result is used for discretization and the determination accuracy can be improved.

本発明にかかるフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置によれば、設定処理の負担を軽減させ、かつ、効率的に処理精度を向上させるための学習をおこなう機能を実現することができるという効果を奏する。 According to the filtering processing method, the filtering processing program, and the filtering device of the present invention, it is possible to realize a function of performing learning for reducing the burden of setting processing and efficiently improving processing accuracy. Play.

以下に添付図面を参照して、この発明にかかるフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置の好適な実施の形態を詳細に説明する。 Exemplary embodiments of a filtering processing method, a filtering processing program, and a filtering device according to the present invention will be explained below in detail with reference to the accompanying drawings.

（フィルタリング処理の概要）
まず、本発明にかかるフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置におけるフィルタリング処理の概要について説明する。図１は、本発明にかかるフィルタリング処理の概要を示す説明図である。 (Overview of filtering process)
First, an outline of filtering processing in the filtering processing method, filtering processing program, and filtering device according to the present invention will be described. FIG. 1 is an explanatory diagram showing an outline of filtering processing according to the present invention.

図１では、まず、受信したメール１０１を解析部１１０によって解析する。そして、子フィルタ１２０と、親フィルタ１３０とのカスケードに接続された２種類のフィルタにメール１０１を入力し単一の判定結果を出力させる構成になっている。このとき、子フィルタ１２０は、ユーザ環境に合わせて設定された判定基準に基づいてメールのブロックをおこなう。一方、親フィルタ１３０は、未知のメールをブロックする。 In FIG. 1, first, the received mail 101 is analyzed by the analysis unit 110. The mail 101 is input to two types of filters connected in cascade with the child filter 120 and the parent filter 130, and a single determination result is output. At this time, the child filter 120 blocks mail based on the determination criterion set in accordance with the user environment. On the other hand, the parent filter 130 blocks unknown mail.

また、子フィルタ１２０と、親フィルタ１３０との２種類のフィルタを透過したメール１０１に対して、ユーザ１０２は、正しくフィルタリングされたか否かの判定をおこなう。ここで、フィルタ１２０，１３０による判定誤りがあった場合には、この判定誤り情報が、子フィルタ１２０にフィードバックされる。子フィルタ１２０は、フィードバックされた判定誤り情報に基づいて、判定基準を調整する。このフィードバックにより、子フィルタ１２０は、よりユーザ環境に合致した判定をおこなうようになる。 Also, the user 102 determines whether or not the mail 101 that has passed through the two types of filters, the child filter 120 and the parent filter 130, has been correctly filtered. Here, when there is a determination error by the filters 120 and 130, this determination error information is fed back to the child filter 120. The child filter 120 adjusts the determination criterion based on the fed back determination error information. This feedback allows the child filter 120 to make a determination that matches the user environment.

以上説明したように、本発明のフィルタリング処理では、子フィルタ１２０は、フィルタリング処理をおこなうごとに、ユーザ判定情報によって処理内容が妥当であったか否かを学習することができる。その結果、子フィルタ１２０の処理能力は向上し、親フィルタ１３０による判定処理は、子フィルタ１２０の判定結果を追認する程度の役割となる。 As described above, in the filtering process of the present invention, each time the filtering process is performed, the child filter 120 can learn whether or not the processing content is appropriate based on the user determination information. As a result, the processing capability of the child filter 120 is improved, and the determination process by the parent filter 130 plays a role of confirming the determination result of the child filter 120.

以下の実施の形態では、上述したようなフィルタリング処理を実行するフィルタリング装置を実現するための具体的な構成と、その処理内容について説明する。 In the following embodiment, a specific configuration for realizing a filtering device that executes the filtering process as described above and the contents of the process will be described.

（フィルタリング装置のハードウェア構成）
まず、本実施の形態にかかるフィルタリング装置のハードウェア構成について説明する。図２は、本実施の形態にかかるフィルタリング装置のハードウェア構成の一例を示すブロック図である。 (Hardware configuration of filtering device)
First, the hardware configuration of the filtering device according to the present embodiment will be described. FIG. 2 is a block diagram illustrating an example of a hardware configuration of the filtering device according to the present embodiment.

図２において、フィルタリング装置２００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）２０４と、ＨＤ（ＨａｒｄＤｉｓｃ）２０５と、ＦＤＤ（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）２０６と、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）２０７と、ＣＤ−ＲＷ（ＣｏｍｐａｃｔＤｉｓｃＲｅＷｒｉｔａｂｌｅ）ドライブ２０８と、ＣＤ−ＲＷ２０９と、ディスプレイ２１０と、キーボード２１１と、マウス２１２と、ネットワークＩ／Ｆ（インタフェース）２１３と、通信ケーブル２１４と、プリンタ２１５と、バス２２０とを備えて構成されている。 In FIG. 2, a filtering device 200 includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, an HDD (Hard Disc Drive) 204, and an HD (Hard Disc). 205, FDD (Flexible Disk Drive) 206, FD (Flexible Disk Drive) 207, CD-RW (Compact Disk ReWrite) drive 208, CD-RW 209, display 210, keyboard 211, mouse 212, network An I / F (interface) 213, a communication cable 214, a printer 215, and a bus 220 are configured. It is.

ＣＰＵ２０１は、フィルタリング装置２００全体を制御する。ＲＯＭ２０２は、各種制御プログラムや本発明にかかるフィルタリング処理プログラムなどを格納する。ＲＡＭ２０３は、可変的なデータを書き換え自在に記憶し、ＣＰＵ２０１のワークエリアとして機能する。ＨＤＤ２０４は、ＣＰＵ２０１の制御にしたがってＨＤ２０５に対するデータのリード／ライトを制御する。ＨＤ２０５は、ＨＤＤ２０４の制御にしたがって書き込まれたデータを記憶する。 The CPU 201 controls the filtering device 200 as a whole. The ROM 202 stores various control programs and filtering processing programs according to the present invention. The RAM 203 stores variable data in a rewritable manner and functions as a work area for the CPU 201. The HDD 204 controls data read / write with respect to the HD 205 according to the control of the CPU 201. The HD 205 stores data written according to the control of the HDD 204.

ＦＤＤ２０６は、ＣＰＵ２０１の制御にしたがってＦＤ２０７に対するデータのリード／ライトを制御する。ＦＤ２０７は、着脱自在であり、ＦＤＤ２０６の制御にしたがって書き込まれたデータを記憶する。ＣＤ−ＲＷドライブ２０８は、ＣＰＵ２０１の制御にしたがってＣＤ−ＲＷ（または、ＣＤ−Ｒ、ＣＤ−ＲＯＭ）２０９に対するデータのリード／ライトを制御する。ＣＤ−ＲＷ２０９は、着脱自在であり、ＣＤ−ＲＷドライブ２０８の制御にしたがって書き込まれたデータを記憶する。 The FDD 206 controls reading / writing of data with respect to the FD 207 according to the control of the CPU 201. The FD 207 is detachable and stores data written according to the control of the FDD 206. The CD-RW drive 208 controls reading / writing of data with respect to the CD-RW (or CD-R, CD-ROM) 209 according to the control of the CPU 201. The CD-RW 209 is detachable and stores data written according to the control of the CD-RW drive 208.

ディスプレイ２１０は、カーソル、メニュー、ウィンドウ、あるいは文字や画像などの各種データを表示する。キーボード２１１は、文字、数値、各種指示などの入力のための複数のキーを備える。マウス２１２は、各種指示の選択や実行、処理対象の選択、マウスポインタの移動などをおこなう。ネットワークＩ／Ｆ２１３は、通信ケーブル２１４を介してＬＡＮ、ＷＡＮ、インターネットなどのネットワークに接続され、当該ネットワークとＣＰＵ２０１とのインタフェースとして機能する。プリンタ２１５は、文字や画像などの各種データを印刷する。バス２２０は上記各部を接続する。 The display 210 displays a cursor, a menu, a window, or various data such as characters and images. The keyboard 211 includes a plurality of keys for inputting characters, numerical values, various instructions, and the like. The mouse 212 performs selection and execution of various instructions, selection of a processing target, movement of a mouse pointer, and the like. The network I / F 213 is connected to a network such as a LAN, a WAN, or the Internet via a communication cable 214, and functions as an interface between the network and the CPU 201. The printer 215 prints various data such as characters and images. A bus 220 connects the above-described units.

（フィルタリング装置の機能的構成）
つぎに、本実施の形態にかかるフィルタリング装置２００の機能的構成について説明する。図３は、本実施の形態にかかるフィルタリング装置の機能的構成を示すブロック図である。図３に示すように、フィルタリング装置２００は、解析部３１０と、第１算出部３２０および第１離散化部３３０からなる第１フィルタと、第２算出部３４０および第２離散化部３５０からなる第２フィルタと、受付部３６０と、調整部３７０とを含んで構成される。 (Functional configuration of filtering device)
Next, a functional configuration of the filtering apparatus 200 according to the present embodiment will be described. FIG. 3 is a block diagram showing a functional configuration of the filtering apparatus according to the present embodiment. As illustrated in FIG. 3, the filtering device 200 includes an analysis unit 310, a first filter including a first calculation unit 320 and a first discretization unit 330, and a second calculation unit 340 and a second discretization unit 350. A second filter, a reception unit 360, and an adjustment unit 370 are included.

解析部３１０は、処理対象データ３０１を構成する要素を解析する。要素の解析とは、処理対象データを構成する連続値を所定の意味を持つ要素に分ける処理である。たとえば、文章を構成するテキストデータであれば、それぞれの単語の要素に解析する。また、解析部３１０は、たとえば、対象データが電子メールデータである場合、当該電子メールデータのヘッダと本文に対して解析をおこなうなど、処理対象データをフィルタリングする際の判定に影響する要素を含んだデータのみを解析対象としてもよい。 The analysis unit 310 analyzes elements constituting the processing target data 301. The element analysis is a process of dividing continuous values constituting processing target data into elements having a predetermined meaning. For example, if it is the text data which comprises a sentence, it will analyze to the element of each word. In addition, the analysis unit 310 includes elements that affect the determination when filtering the processing target data, for example, when the target data is email data, the header and body of the email data are analyzed. Only the data may be analyzed.

第１算出部３２０は、解析部３１０によって解析された各要素について処理対象データ３０１における出力確率を算出する。このとき第１算出部３２０にて用いられる出力確率の算出手法は任意である。 The first calculation unit 320 calculates the output probability in the processing target data 301 for each element analyzed by the analysis unit 310. At this time, the calculation method of the output probability used in the first calculation unit 320 is arbitrary.

第１離散化部３３０は、解析部３１０によって解析された各要素を、第１算出部３２０によって算出された出力確率に基づいて有効値／無効値に離散化する。このとき、第１離散化部３３０は、各要素をたとえばシグモイド関数などの任意の関数に写像して得られた値を用いて離散化をおこなってもよい。このような関数を適用させることによって、出力確率の分布が強調され、有効値／無効値の判定を容易におこなうことができる。 The first discretization unit 330 discretizes each element analyzed by the analysis unit 310 into an effective value / invalid value based on the output probability calculated by the first calculation unit 320. At this time, the first discretization unit 330 may perform discretization using a value obtained by mapping each element to an arbitrary function such as a sigmoid function. By applying such a function, the distribution of output probabilities is emphasized, and valid / invalid values can be easily determined.

第１離散化部３３０では、上述したように、離散化に従来のような閾値の設定を必要としないが、ユーザがフィルタリング処理に適した閾値の情報を保有している場合には、この閾値を利用してもよい。このような場合、第１離散化部３３０では、各要素の出力確率と、あらかじめ設定した閾値との比較結果から各要素を有効値／無効値に離散化する。 As described above, the first discretization unit 330 does not require a conventional threshold setting for discretization, but this threshold is used when the user has threshold information suitable for the filtering process. May be used. In such a case, the first discretization unit 330 discretizes each element into an effective value / invalid value from a comparison result between the output probability of each element and a preset threshold value.

第２算出部３４０は、第１離散化部３３０によって有効値／無効値に離散化された要素を、前記処理対象データを構成する要素に追加し、当該追加された要素を含んだ各要素の出力確率を算出する。また、第２算出部３４０による出力確率の算出手法は、第１算出部３２０と同様に任意であるが、第１算出部３２０と異なる算出手法が適用されている。 The second calculation unit 340 adds the element discretized to the valid value / invalid value by the first discretization unit 330 to the element constituting the processing target data, and adds the element including the added element. Calculate the output probability. In addition, the calculation method of the output probability by the second calculation unit 340 is arbitrary similarly to the first calculation unit 320, but a calculation method different from the first calculation unit 320 is applied.

第２離散化部３５０は、第２算出部３４０によって算出された出力確率に基づいて前記各要素を有効値／無効値に離散化することにより、処理対象データ３０１が所望のデータであるか否かを判定する。この第２離散化部３５０によって所望するデータであると判定された場合、処理対象データ３０１は、ユーザに提供される。 The second discretization unit 350 discretizes each element into an effective value / invalid value based on the output probability calculated by the second calculation unit 340, thereby determining whether the processing target data 301 is desired data. Determine whether. When the second discretization unit 350 determines that the data is desired, the processing target data 301 is provided to the user.

なお、第２離散化部３５０も、上述した第１離散化部３３０と同様に、任意の関数に写像して得られた値を用いて離散化をおこなってもよいし、フィルタリング処理に適した閾値の情報を保有している場合には、この閾値を利用してもよい。 Note that the second discretization unit 350 may perform discretization using a value obtained by mapping to an arbitrary function, similarly to the first discretization unit 330 described above, and is suitable for filtering processing. If threshold information is held, this threshold may be used.

受付部３６０は、ユーザから処理対象データ３０１についてのユーザ判定情報３０２を受け付ける。ユーザ判定情報３０２とは、すなわち、第２離散化部３５０による判断の正誤をあらわす情報である。 The accepting unit 360 accepts user determination information 302 regarding the processing target data 301 from the user. In other words, the user determination information 302 is information indicating whether the determination by the second discretization unit 350 is correct or incorrect.

調整部３７０は、受付部３６０によって誤判定、すなわち、ユーザに提供された処理対象データ３０１がユーザの所望するデータではなかった旨の指示を受け付けた場合に、この誤判断を、以後の判定処理に反映させる。 When the accepting unit 360 accepts an erroneous determination by the accepting unit 360, that is, when the instruction indicating that the processing target data 301 provided to the user is not the data desired by the user is received, the adjusting unit 370 determines this misjudgment as a subsequent determination process. To reflect.

具体的には、誤判定とされた処理対象データ３０１を構成する各要素に関して、第１算出部３２０にて算出される出力確率を調整する。したがって、誤判断がなされた処理対象データを構成する要素（たとえば要素Ａ）の出力確率が低くなり、以後フィルタリング装置２００にて処理される処理対象データ２０１の場合、上述した要素Ａは、以前よりも多く含まれていなければ高い出力確率とはならず、後段の第１離散化部３３０では、有効値として離散化されない。したがって、同じ構成の処理対象データが再度入力された場合には、第１離散化部３３０では、無効値として離散化され、ユーザの所望するデータは判別されなくなる。 Specifically, the output probability calculated by the first calculation unit 320 is adjusted for each element constituting the processing target data 301 determined to be erroneously determined. Therefore, in the case of the processing target data 201 to be processed by the filtering device 200 after that, the output probability of the element (for example, the element A) constituting the processing target data that has been erroneously determined becomes low. If it is not included, the output probability is not high, and the first discretization unit 330 in the subsequent stage does not discretize it as an effective value. Therefore, when the processing target data having the same configuration is input again, the first discretization unit 330 discretizes it as an invalid value, and the data desired by the user cannot be determined.

以上説明したように、各構成のうち、解析部３１０は、図１にて説明した解析部１１０に相当する。また、第１算出部３２０および第１離散化部３３０による第１フィルタによって、図１の子フィルタ１２０を構成する。そして、第２算出部３４０および第２離散化部３５０による第２フィルタによって図１の親フィルタ１３０を構成する。そして、受付部３６０および調整部３７０は、フィルタリング処理の精度を向上させるためのフィードバックをおこなう機能部となる。 As described above, among the components, the analysis unit 310 corresponds to the analysis unit 110 described with reference to FIG. Further, the first filter by the first calculation unit 320 and the first discretization unit 330 constitutes the child filter 120 of FIG. The parent filter 130 of FIG. 1 is configured by the second filter by the second calculation unit 340 and the second discretization unit 350. And the reception part 360 and the adjustment part 370 become a function part which performs the feedback for improving the precision of a filtering process.

（フィルタリング装置の処理手順）
つぎに、本実施の形態にかかるフィルタリング装置２００の処理手順について説明する。図４は、本実施の形態にかかるフィルタリング装置の処理手順を示すフローチャートである。図４のフローチャートにおいて、まず、フィルタリング装置２００に処理対象データ３０１が入力されたか否かを判定する（ステップＳ４０１）。 (Processing procedure of the filtering device)
Next, a processing procedure of the filtering apparatus 200 according to the present embodiment will be described. FIG. 4 is a flowchart showing a processing procedure of the filtering apparatus according to the present embodiment. In the flowchart of FIG. 4, first, it is determined whether or not the processing target data 301 has been input to the filtering device 200 (step S401).

ステップＳ４０１において、処理対象データ３０１が入力されるまで待ち（ステップＳ４０１：Ｎｏのループ）、処理対象データ３０１が入力されると（ステップＳ４０１：Ｙｅｓ）、解析部３１０において、処理対象データ３０１の構成要素を解析する（ステップＳ４０２）。 In step S401, the process waits until the process target data 301 is input (step S401: No loop). When the process target data 301 is input (step S401: Yes), the analysis unit 310 configures the process target data 301. The element is analyzed (step S402).

ステップＳ４０２において、各要素に解析されると、フィルタリング処理のために各要素を離散化する処理に移行する。まず、第１算出部３２０によって、処理対象データ３０１を構成する各要素の出力確率を算出する（ステップＳ４０３）。そして、第１離散化部３３０によって、ステップＳ４０３によって算出された出力確率に基づいた離散化をおこない（ステップＳ４０４）、第１フィルタにおけるフィルタリング処理が完了する。 In step S402, when each element is analyzed, the process proceeds to a process of discretizing each element for the filtering process. First, the first calculation unit 320 calculates the output probability of each element constituting the processing target data 301 (step S403). Then, the first discretization unit 330 performs discretization based on the output probability calculated in step S403 (step S404), and the filtering process in the first filter is completed.

つぎに、第２算出部３４０によって、ステップＳ４０４によって離散化された各要素の離散化結果および処理対象データ３０１との出力確率を算出する（ステップＳ４０５）。さらに、第２離散化部３５０によってステップＳ４０５によって算出された出力確率から各要素を離散化し、処理対象データ３０１をユーザの所望するデータか否かの判定をおこない（ステップＳ４０６）、第２フィルタにおけるフィルタリング処理が完了する。 Next, the second calculation unit 340 calculates the discretization result of each element discretized in step S404 and the output probability with the processing target data 301 (step S405). Further, each element is discretized from the output probability calculated in step S405 by the second discretization unit 350, and it is determined whether or not the processing target data 301 is data desired by the user (step S406). The filtering process is complete.

以上説明したステップＳ４０６までの処理によって処理対象データ３０１に対するフィルタリング処理が終了する。フィルタリング装置２００では、処理対象データ３０１に対するフィルタリング処理終了後、今回おこなったフィルタリング処理の正誤を自装置に反映させる処理に移行する。 The filtering process for the processing target data 301 is completed by the process up to step S406 described above. In the filtering device 200, after the filtering processing on the processing target data 301 is completed, the filtering device 200 shifts to processing for reflecting the correctness of the filtering processing performed this time on the own device.

まず、受付部３６０によって、ステップＳ４０５にておこなわれた処理対象データ３０１に対する判定結果が正しいか否かの判断を受け付ける（ステップＳ４０７）。この正誤判断は、ユーザによっておこなわれる。ここで、判定結果が正しいとの判断を受け付けた場合には（ステップＳ４０７：Ｙｅｓ）、今回のフィルタリング処理に問題はなかったことになり、そのまま一連の処理を終了する。 First, the reception unit 360 receives a determination as to whether the determination result for the processing target data 301 performed in step S405 is correct (step S407). This correct / incorrect determination is made by the user. Here, when it is determined that the determination result is correct (step S407: Yes), there is no problem in the current filtering process, and the series of processes ends.

一方、判定結果が誤っているとの判断を受け付けた場合には（ステップＳ４０７：Ｎｏ）、今回のフィルタリング処理に問題があったため、その問題点を修正するため、調整部３７０によって第１算出部３２０における出力確率算出の設定を調整し（ステップＳ４０８）、一連の処理を終了する。 On the other hand, when the determination that the determination result is incorrect is accepted (step S407: No), there is a problem with the current filtering process, and therefore the first calculation unit is corrected by the adjustment unit 370 to correct the problem. The setting of the output probability calculation in 320 is adjusted (step S408), and the series of processes is terminated.

以上説明したように、フィルタリング装置２００では、複数のフィルタを直列に連結した場合に、それぞれでは独自の判定をおこなわせるが、後段の第２フィルタには、前段の第１フィルタの判定結果を処理対象データ２０１と併せて入力する。このような手順をとることによって、後段の第２フィルタは、自身の判定に加えて、第１フィルタの判定結果も取り入れることになる。 As described above, in the filtering device 200, when a plurality of filters are connected in series, each of them makes an independent determination, but the second filter in the subsequent stage processes the determination result of the first filter in the previous stage. Input together with the target data 201. By taking such a procedure, the second filter in the subsequent stage takes in the determination result of the first filter in addition to its own determination.

さらに、ユーザがフィードバックをかけたいときは、前段の第１フィルタに反映され、次回からは、前回までの誤判定を起こさないような判定が可能となる。また、第１フィルタの判定結果が更新されると、自動的に第２フィルタの判定結果も更新されるため、フィルタ間の閾値や、判定結果の比較に相当する機能は、すべて第１フィルタによる出力確率算出処理の調整によって制御できることになる。 Further, when the user wants to give feedback, it is reflected in the first filter in the previous stage, and from the next time, it is possible to make a determination so as not to cause erroneous determination up to the previous time. Further, when the determination result of the first filter is updated, the determination result of the second filter is also automatically updated. Therefore, all of the threshold values between the filters and the functions corresponding to the comparison of the determination results are based on the first filter. It can be controlled by adjusting the output probability calculation process.

（離散化の手法）
つぎに、第１離散化部３３０および第２離散化部３５０における離散化の手法について説明する。上述したように、第１離散化部３３０および第２離散化部３５０における離散化の手法に特に限定はない。ここで、簡易で効率的な手法の一例として、任意の関数に写像する手法を説明する。図５は、ある単語の出力確率を任意の関数により写像した図である。また、図６は、単語数ごとの出力確率を任意の関数により写像した図である。 (Discrete method)
Next, a discretization technique in the first discretization unit 330 and the second discretization unit 350 will be described. As described above, the discretization technique in the first discretization unit 330 and the second discretization unit 350 is not particularly limited. Here, a method for mapping to an arbitrary function will be described as an example of a simple and efficient method. FIG. 5 is a diagram in which the output probability of a certain word is mapped by an arbitrary function. FIG. 6 is a diagram in which the output probability for each number of words is mapped by an arbitrary function.

ここでは、図５や図６では、第１算出部３２０や第２算出部３４０（図３参照）にて算出された出力確率の確率値を元に、任意の関数によって写像した場合における、ある単語の出力確率の分布（図５）や、単語数に応じた出力確率の分布（図６）をあらわしている。 Here, in FIGS. 5 and 6, there is a case where mapping is performed by an arbitrary function based on the probability values of the output probabilities calculated by the first calculation unit 320 and the second calculation unit 340 (see FIG. 3). The distribution of output probabilities of words (FIG. 5) and the distribution of output probabilities according to the number of words (FIG. 6) are shown.

また、図５、６における３種類の曲線（実線、破線、一点鎖線）は、それぞれ、適応させている関数の違いをあらわしている。たとえば実線の曲線は、出力確率が０．５より離れている場合、単語の確率がより強まるようなシグモイド関数であり、比較的標準的で癖のない挙動になると予想される。 In addition, the three types of curves (solid line, broken line, and alternate long and short dash line) in FIGS. 5 and 6 represent differences in the function to be adapted. For example, the solid curve is a sigmoid function that increases the probability of a word when the output probability is more than 0.5, and is expected to have a relatively standard and flawless behavior.

また、破線の曲線は、図５に示した単語の確率については線形だが、図６に示した単語数の場合、０．５付近から立ち上がりが急になっているため、学習結果が反映されやすいことをあらわしている。また、一点鎖線の曲線は、０に近いか、１に近いかによって偏りを持たせた関数となっている。この関数によると、１に近い判定ほど学習結果に反映されやすいことをあらわしている。このように、適用させる関数によって、学習傾向を解析的に制御することが可能となる。 The broken curve is linear with respect to the word probabilities shown in FIG. 5. However, in the case of the number of words shown in FIG. 6, the rise is steep from around 0.5, so that the learning result is easily reflected. It shows that. The dashed-dotted curve is a function that is biased depending on whether it is close to 0 or close to 1. This function indicates that the determination closer to 1 is more easily reflected in the learning result. Thus, the learning tendency can be analytically controlled by the function to be applied.

以上説明したように、本発明にかかるフィルタリング処理をおこなった場合、各フィルタの判定処理をおこなう際に、入力された処理対象データを利用（解析、出力確率算出など）して離散化をおこなう。したがって、従来のフィルタリング処理のような、ユーザによるパラメータの設定処理を大幅に簡略することができる。 As described above, when the filtering process according to the present invention is performed, when the determination process of each filter is performed, the input process target data is used (analysis, output probability calculation, etc.) to perform discretization. Therefore, the parameter setting process by the user like the conventional filtering process can be greatly simplified.

また、判定誤りがあった場合は、ユーザはフィルタリング処理に誤り内容をフィードバックする。したがって、フィルタリング処理は、学習され、次回の判定時にはより高精度な判定をおこなうことができる。 When there is a determination error, the user feeds back the error content to the filtering process. Therefore, the filtering process is learned, and a more accurate determination can be performed at the next determination.

以上説明したように、本発明にかかるフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置によれば、設定処理の負担を軽減させ、かつ、効率的に処理精度を向上させるための学習をおこなう機能を実現することができる。 As described above, according to the filtering processing method, filtering processing program, and filtering device of the present invention, the function of performing learning to reduce the burden of setting processing and efficiently improve processing accuracy is realized. can do.

また、本発明のフィルタリング処理は、上述したような電子メールのフィルタリングに適用する以外にも、スパムフィルタやＷｅｂフィルタとして適用させてもよい。また、フィルタリング機能を検索エンジンのプロファイルや、自然言語処理における学習機能の最適化などに適用させることもできる。 Further, the filtering process of the present invention may be applied as a spam filter or a Web filter in addition to the above-described filtering of electronic mail. The filtering function can also be applied to search engine profiles, learning function optimization in natural language processing, and the like.

なお、本実施の形態で説明したフィルタリング処理方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネットなどのネットワークを介して配布することが可能な伝送媒体であってもよい。 The filtering processing method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

以上のように、本発明にかかるフィルタリング処理方法、フィルタリング処理プログラムおよびフィルタリング装置は、連続値からなるデータのフィルタリング処理にて有用であり、特に、個々のユーザ環境に適応させる必要のあるメールフィルタに適している。 As described above, the filtering processing method, filtering processing program, and filtering device according to the present invention are useful in the filtering processing of data consisting of continuous values, and particularly for mail filters that need to be adapted to individual user environments. Is suitable.

本発明にかかるフィルタリング処理の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the filtering process concerning this invention. 本実施の形態にかかるフィルタリング装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the filtering apparatus concerning this Embodiment. 本実施の形態にかかるフィルタリング装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the filtering apparatus concerning this Embodiment. 本実施の形態にかかるフィルタリング装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the filtering apparatus concerning this Embodiment. ある単語の出力確率を任意の関数により写像した図である。It is the figure which mapped the output probability of a certain word by arbitrary functions. 単語数ごとの出力確率を任意の関数により写像した図である。It is the figure which mapped the output probability for every number of words by arbitrary functions.

Explanation of symbols

２００フィルタリング装置
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４ＨＤＤ
２０５ＨＤ
２０６ＦＤＤ
２０７ＦＤ
２０８ＣＤ−ＲＷドライブ
２０９ＣＤ−ＲＷ
２１０ディスプレイ
２１１キーボード
２１２マウス
２１３ネットワークＩ／Ｆ
２１４通信ケーブル
２１５プリンタ
２２０バス
３０１処理対象データ
３０２ユーザ判定情報
３１０解析部
３２０第１算出部
３３０第１離散化部
３４０第２算出部
３５０第２離散化部
３６０受付部
３７０調整部 200 Filtering device 201 CPU
202 ROM
203 RAM
204 HDD
205 HD
206 FDD
207 FD
208 CD-RW drive 209 CD-RW
210 Display 211 Keyboard 212 Mouse 213 Network I / F
214 communication cable 215 printer 220 bus 301 processing target data 302 user determination information 310 analysis unit 320 first calculation unit 330 first discretization unit 340 second calculation unit 350 second discretization unit 360 reception unit 370 adjustment unit

Claims

A filtering processing method for determining whether processing target data is data desired by a user,
An analysis step of analyzing elements constituting the processing target data;
A first calculation step of calculating a probability of belonging to the processing target data of each element analyzed by the analysis step;
Each element analyzed by the analysis step, a first discrete step of discretizing enable value / disable value based on probability calculated by the first calculation step,
A second calculation step of calculating a probability that belongs to the processing target data of the respective elements including the discretized elements enable value / invalid value by the first discrete step,
By discretizing enable value / invalid value the respective elements based on the probability calculated by the second calculation step, the processing target data is the second discrete determines whether a desired data Conversion process,
Including a filtering method.

An accepting step for accepting correct / incorrect judgment in the second discretization step;
For each element constituting the processing object data which is an erroneous decision accepted by said accepting step, the adjustment step of adjusting the probabilities that will be calculated by the first calculation step,
The filtering processing method according to claim 1, further comprising:

The discretization is performed using at least one of the first discretization step and the second discretization step using a value obtained by mapping each element to an arbitrary function. 2. The filtering processing method according to 2.

Said first discretization step and the second at least one in the discretization step, wherein the the probability of each element, the discretizing enable value / invalid value from the comparison result between the threshold value set in advance The filtering processing method according to claim 1 or 2.

The filtering according to any one of claims 1 to 4, wherein, in the analysis step, when the processing target data is email data, the header and body of the email data are analyzed. Processing method.

A filtering processing program for causing a computer to determine whether processing target data is data desired by a user,
An analysis step for analyzing elements constituting the processing target data;
A first calculation step of calculating a probability belonging to the processing target data of each element analyzed in the analysis step;
Each element was analyzed by the analysis step, a first discrete step of discretizing enable value / disable value based on probability obtained by calculated by the first calculation step,
A second calculation step of calculating the probability of belonging to the processing target data of each element containing elements that were discretized enable value / invalid value by the first discrete step,
By discretizing the valid values / invalid value the respective elements based on the probability calculated by the second calculation step, the second discrete processing target data to determine whether or not the desired data Conversion process,
Filtering processing program characterized by causing a computer to execute

An accepting step for accepting correct / incorrect judgment in the second discretization step;
For each element constituting the processing object data which is an erroneous decision accepted by said accepting step, the adjustment step of adjusting the probabilities that will be calculated by the first calculation step,
The filtering processing program according to claim 6, wherein the computer is executed.

A filtering device that determines whether processing target data is data desired by a user,
Analyzing means for analyzing elements constituting the processing target data;
First calculation means for calculating a probability belonging to the processing target data of each element analyzed by the analysis means;
A first discretizing means for discretizing each element analyzed, enabling value / disable value based on probability calculated by the first calculating means by said analyzing means,
A second calculation means for calculating the probability of belonging to the processing target data of the respective elements including the discretized elements enable value / invalid value by the first discretizing means,
By discretizing enable value / invalid value the respective elements based on the probability calculated by the second calculating means, the processing target data is the second discrete determines whether a desired data And
Accepting means for accepting correctness of judgment by the second discretization means;
For each element constituting the processing object data which is to have erroneous determination that accepted by the accepting means, and adjusting means for adjusting the probability that will be calculated by the first calculating means,
A filtering device comprising: