WO2002095620A2 - Procede de discretisation d'attributs d'une base de donnees - Google Patents
Procede de discretisation d'attributs d'une base de donnees Download PDFInfo
- Publication number
- WO2002095620A2 WO2002095620A2 PCT/FR2002/001711 FR0201711W WO02095620A2 WO 2002095620 A2 WO2002095620 A2 WO 2002095620A2 FR 0201711 W FR0201711 W FR 0201711W WO 02095620 A2 WO02095620 A2 WO 02095620A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attribute
- discretization
- groups
- elementary groups
- pair
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000007423 decrease Effects 0.000 claims abstract description 10
- 230000004927 fusion Effects 0.000 claims description 28
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 2
- 238000003379 elimination reaction Methods 0.000 claims description 2
- 238000012800 visualization Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000007418 data mining Methods 0.000 description 4
- 244000140747 Iris setosa Species 0.000 description 3
- 235000000827 Iris setosa Nutrition 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000405147 Hermes Species 0.000 description 1
- 241001113425 Iridaceae Species 0.000 description 1
- 241001627144 Iris versicolor Species 0.000 description 1
- 241001136653 Iris virginica Species 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Definitions
- the present invention relates to a method for discretizing attributes of a database.
- the invention finds particular application in the statistical exploitation of data, in particular in the field of supervised learning.
- Data mining generally aims to explore, classify and extract the rules of associations underlying the. within a database. It is notably used to build classification or prediction models.
- the classification makes it possible to identify within the database of categories from combinations of attributes, then to organize the data according to these categories. For example, if the database relates to purchases of products by consumers, these could be classified into different categories: loyal customers, occasional customers, customers looking for discounted products, customers looking for high-end products, etc.
- Prediction aims to describe how one or more attributes of the database are will behave in the future. In the example of the purchasing database mentioned above, it may be interesting to predict the behavior of these consumers based on a drop or an increase in the price of a particular product.
- supervised data mining is the construction of a predictive model aimed at predicting a given attribute.
- This construction consists in searching among the attributes of the database considered to identify the one or those which have the strongest statistical dependence with a target attribute and to describe this dependence. For example, if we have classified consumers according to their annual purchase amounts into different consumption categories: heavy consumption, medium consumption, low consumption, it will be interesting to determine what are the attributes of the purchasing database which are the most correlated (or equivalently, the least statistically independent) of the attribute giving the consumption class. Note that instead of the target attribute
- the values (also called modalities) taken by an attribute can be numeric (for example an amount of purchases) or symbolic (for example a category of consumption).
- modalities for example an amount of purchases
- symbolic attribute for example a category of consumption
- certain methods of supervised data mining require a "discretization" of the numerical attributes.
- discretization of a numerical attribute is meant here a division of the domain of the values taken by an attribute into a finite number of intervals. If the domain in question is a range of continuous values, the discretization will result in a quantification of this range. If this domain already consists of ordered discrete values, the discretization will have the function of grouping these values into groups of consecutive values.
- top-down methods start from the full interval to be discretized and seek the best cut-off point in the interval by optimizing a predetermined criterion.
- bottom-up methods start from elementary intervals and seek the best fusion of two adjacent intervals by optimizing a predetermined criterion. In both cases, they are applied iteratively until a stop criterion is satisfied.
- Table 1 shows the contingency table of the variables S and T with the following conventions: n v is the number of individuals observed for the z 'th modality of the variable S and the th modality of the variable T. n t] is also called the observed number in box (ij); n, is the total number of individuals for the zth modality of the variable S. n, is also called observed workforce of line i; n j is the total number of individuals for the me modality of the variable T. n ⁇ is also called the observed workforce in column j; N is the total number of individuals.
- I and J respectively the number of modalities of the attribute S and the number of modalities of the attribute T.
- n is the number of the line / and " / + ⁇ is the number of the line z ' + l
- the, + i j represent the observed proportions of modalities of T for the line i + 1.
- the local probability distribution q, q ' ⁇ , .., q' j of the modalities of the target attribute can be expressed by:
- ⁇ l + l is a random variable following a law from ⁇ 2 to J -1 degrees of freedom.
- the ChiMerge method proposes to merge the lines z and z ' + l if:
- prob (a, K) denotes the probability that ⁇ > ⁇ for the law of ⁇ 2 at K degrees of freedom and pn is a predetermined threshold value setting the method.
- the value prob (, K) is obtained from a classic table of yr 2 giving the value of ⁇ as a function of d prob (, K) and K.
- Condition (5) expresses that the probability of independence of S and T in view of the two lines considered is less than a threshold value.
- the merging of consecutive lines is iterated as long as condition (5) is satisfied.
- the merger of two lines leads to the consolidation of their terms and the summation of their staff. For example in the case of a numeric attribute with continuous values, we have before fusion:
- a first problem raised by the use of the ChiMerge method is the choice of the parameter pn which must not be too high under penalty of merging all the lines nor too weak under penalty of merging no pair of them. In practice, it is very difficult to find a compromise.
- a second intrinsic problem with this method is to operate locally without taking into account all the modalities (or the number of intervals) of the source attribute. We do not know a priori if the result of the discretization is globally optimal on this set.
- the ChiMerge method is limited to one-dimensional discretization in the sense that it can operate on only one source attribute at a time and not on a p-tuple of attributes.
- the ChiMerge method does not allow the probability of independence between a source attribute and a target attribute to be measured and, consequently, to classify source attributes according to their probability of independence vis opposite the target attribute.
- the objective of the present invention is to propose a method for discretizing attributes which does not have the drawbacks and limitations stated above.
- the invention is defined by a method for discretizing an attribute of a database containing a population of individuals, said attribute, called source attribute. can take several modalities, said method comprising a first step in which said modalities of the source attribute are grouped into elementary groups and, a second step in which one determines, from the contingency table of the source attribute and a target attribute, among a set of pairs of elementary groups, the pair of elementary groups whose fusion most strongly decreases the probability of independence of the source attribute and the target attribute. and a third step in which the pair of elementary groups thus determined is merged, said second and third steps being iterated as long as there is a pair of elementary groups making it possible to reduce said probability of independence.
- the variation of jr 2 of the contingency table is calculated before and after fusion of said pair.
- the variations of the ⁇ associated with the different couples will then be sorted in the form of a list of decreasing values and that the first couple in the list will be selected.
- the pair of elementary groups being selected said pair will be merged if the probability of ⁇ relating to the contingency table after merging said pair is less than the probability of tf relating to the contingency table before merging.
- the probabilities of ⁇ 2 relating to the contingency table before and after fusion are expressed in a logarithmic manner.
- said set of pairs of elementary groups consists of all the pairs of neighboring groups in the sense of a predetermined neighborhood relationship.
- One preferably searches among the pairs of neighboring elementary groups for those comprising at least one group having at least one theoretical workforce per cell of the contingency table less than a predetermined minimum workforce and they are identified as priority couples by means of information of identification. In this case, if there is one or more priority couples, the priority couple is produced, producing the highest value of% after fusion.
- the source attribute being a mono-dimensional numeric attribute
- the neighboring elementary groups are formed by adjacent intervals.
- the source attribute being a multidimensional numeric attribute formed by plurality of monodimensional numeric attributes and the individuals of the population being represented by points in the space of said attributes, said elementary groups are the Voronoi cells in this space, containing said points.
- the source attribute is of symbolic type.
- the invention also relates to a method for evaluating the dependence of a two-dimensional digital attribute, formed by a pair of digital attributes. mono-dimensional, with respect to a target attribute.
- the individuals of the population are represented by points in the plane of said attributes.
- the two-dimensional attribute is discretized by the multidimensional discretization method mentioned above and is visualized by means of visualization of the groups of Voronoi cells fused by said method.
- the invention relates to data mining software comprising a discretization program of at least one attribute of a database, such that its execution on a computer performs the steps of the method described above.
- FIG. . 1 illustrates in the form of a flowchart the method for discretizing attributes according to an embodiment of the invention
- Fig. 2 illustrates a first example of discretization of a symbolic attribute
- Fig. 3 illustrates a second example of discretization of a symbolic attribute before and after fusion
- Fig. 4 shows an example of a Noronoi diagram
- Fig. 5 shows the Delaunay diagram associated with the Voronoi diagram of FIG. 4
- Fig. 6 represents a set of individuals projected on the plane of two numerical attributes
- Fig. 7 shows the Delaunay diagram associated with the set of individuals in FIG. 6
- Fig. 8 represents the discretization zones associated with the set of individuals in FIG. 5.
- a first general idea underlying the invention is to discretize a source attribute by optimizing a statistical criterion relating to the entire contingency table.
- a second general idea on which the invention is based is to extrapolate this discretization to the multi-dimensional case by using a Delaunay graph.
- Expression (8) can be expressed simply as a function of the value of the pre-merger:
- ⁇ ⁇ , +1 is the variation of the result of the fusion of lines z and z ' + l.
- the value of ⁇ ⁇ , +]) can be calculated explicitly as a function of the proportions of staff in lines z and z ' + l:
- condition (12) reflects a decrease in the probability of independence of S and T after merging of the lines and i +1.
- the value of ⁇ 2 can only decrease with fusion. Since prob (a, K) is a decreasing function of ⁇ and increasing of K, the relation (12) can be checked only thanks to the reduction in the number of degrees of freedom.
- the decrease in the probability of independence will be all the more important as A ⁇ , t +]) will be low in absolute value, that is to say from relation (11) that the proportions observed for the lines considered will be more close and this for the smallest proportions q.
- condition (12) If condition (12) is satisfied, the lines z ' o and z ' o + 1 are merged. On the other hand, if condition (12) is not verified, then it is not verified for any index as a result of the decrease in prob (, K) as a function of ⁇ . The merging process is then stopped.
- the method described above leads to an ad hoc discretization of the domain of modalities, that is to say to a discretization which minimizes the independence between the source attribute and the target attribute over the entire domain.
- the discretization method makes it possible to group adjacent intervals having similar prediction behaviors with respect to the target attribute, the grouping being stopped when it affects the quality of prediction, in other words when it no longer does decrease the probability of independence of attributes.
- We obtain by successive mergers a contingency table whose number of lines is reduced and whose numbers per cell increase.
- Fig. 1 illustrates the algorithm of an example of a discretization method according to the invention.
- the algorithm begins with a step 100 of partitioning the domain of values of the source law into ordered elementary intervals.
- the value of ⁇ 2 for the contingency table and the values ⁇ t) for the / rows of the table are calculated at 110.
- the values ⁇ ⁇ + I) are then deducted from the values ⁇ in step 120 and sorted by decreasing values in the form of a list at 130.
- Each element of the list corresponds to the possible fusion of a couple of lines z and z ' + l.
- Step 140 tests whether the minimum staffing condition (13) is verified. If yes, we go directly to the test
- step 145 If not, we continue with step 145.
- step 145 priority is given (by means of flags) to the pairs of lines of which at least one of them has not reached the minimum number and in 165 the first priority couple of the list that we will note (io, z ' o + 1). The process continued in 170.
- step 150 it is tested whether the first element of the list satisfies the condition (12).
- step 170 the process ends in 190. If so, the first pair in the list is selected in 160, which we will also note (z ' o, z ' o + 1) and we continue with step 170.
- step 170 the lines io and, _ • +! of the selected couple are merged, i.e. the intervals S, and S, * +1 are concatenated.
- the new value of ⁇ ⁇ 2 la) is then calculated in 180 as well as the new values of ⁇ ⁇ ⁇ and ⁇ ⁇ +1) for the adjacent intervals, if they exist.
- the list of values ⁇ ⁇ (+ I) is updated: the old values A ⁇ ⁇ l _ l ⁇ ) and ⁇ ⁇ t +1) are deleted and the new values are stored.
- the list of values A ⁇ l +1) is advantageously organized in the form of a balanced binary search tree making it possible to manage insertions / deletions while maintaining the order relation in the list. Thus, it is not necessary to completely sort the list at each step.
- the list of flags is also updated. After the update, the process returns to test step 140.
- the list is constituted by the (positive) values ⁇ j , n +]) instead of being constituted by (negative) values A ⁇ l + 1 ⁇ .
- the value of ⁇ 2 of the discretized attribute At the end of the discretization process, we have the value of ⁇ 2 of the discretized attribute.
- the numerical modalities are first ordered to form the rows of the contingency table for S and T and then grouped by elementary groups, an elementary group can, if necessary, contain only one element.
- the discretization method operates on the same principle as above, merging the elementary groups as long as the probability of independence of S and T decreases.
- the discretization method can still operate on symbolic attributes, with the difference that there is not necessarily a total order relation between the modalities of the attribute. If such an order relation exists, we can come back to the previous case by ordering the modalities according to this order relation.
- Fig. 2 illustrates this situation: the individuals are grouped by elementary groups G ⁇ , G 2 , .., G ⁇ , each group containing the individuals relating to a modality or to an interval of terms (within the meaning of the aforementioned order relation).
- the groups are equivalent to the rows in the contingency table. They can be ordered within a linear graph, each node corresponding to a group. The fusion can only be carried out according to the arcs of this graph, between neighboring groups.
- the operation of the discretization method will be illustrated using an example relating to a database containing attributes of flowers of the Iris family.
- the population of the database considered is 150 individuals.
- the source attribute is a numeric attribute with continuous values and the target attribute is a symbolic attribute with 3 modalities.
- the contingency table is given below:
- ⁇ associated with the discretized law is 70.74, which corresponds to a probability of independence of 1.66 IO "14 (law of ⁇ 2 to 4 degrees of freedom). Two merges of intervals are still the best of them is the first fusion, which corresponds to a ⁇ 2 of value 54.17.
- the associated probability of independence is 1.73 10 "12 (law of ⁇ 2 to 2 degrees of freedom). This merger does not respect condition (12) (it increases the probability of independence) and is therefore refused.
- the attribute “sep width” has been discretized in 3 intervals. In the first interval, the class Iris setosa is very rare. In the second, there is a balance between the three classes and in the last, the Iris setosa class is by far the most frequent. This partition is the one that minimizes the probability of independence of the attributes "width of sepal" and "class of the flower”.
- D two-dimensional numerical attribute
- Each individual can then be represented as a point having as coordinates the modalities of S and S 2 of the individual.
- the population of N individuals in the database can thus be "projected" into a plane (S 1 , S 2 ) in the form of a set £ of points.
- the neighborhood relationships between these points can be viewed from the Voronoi diagram of the set £.
- the Noronoi diagram associated with a set £ of points is a partition of space (here a plane) in cells each containing a point of £, each cell being defined as the set of points in space which are closer to a given £ point than to any other £ point.
- a cell is formed of a convex polyhedron (here a polygon) surrounding a point of £, each face of the polyhedron being a mediating plane of the point of £ associated with the cell and a neighboring point.
- a Voronoi diagram associated with a set of points is shown in Fig. 4. From the Voronoi diagram we can construct a dual diagram, called the Delaunay diagram, connecting the points of £ belonging to adjacent cells. There is shown in FIG.
- each arc of the Delaunay graph represents a neighborhood relationship between two points of £.
- the discretization method builds the Delaunay graph of £ and uses the arcs of the Delaunay graph to partition the space into elementary areas. More precisely, the graph consists of direct arcs and indirect arcs.
- the direct arcs between two nodes only pass through the two adjacent cells associated with these nodes.
- the nearest neighbor is always a of the two points of the two adjacent cells.
- the indirect arcs pass through at least a third Voronoi cell.
- the nearest neighbor may be a third point not belonging to one of the two adjacent cells.
- the indirect arcs are eliminated. Only the direct arcs, translating a direct relation of proximity are taken into account during the initialization of the method of discretization.
- the fusion of Voronoi cells according to the direct arcs of the Delaunay graph provides the elementary zones.
- the discretization method After having partitioned the space into elementary zones, the discretization method operates iteratively by merging zones, the only authorized mergers being indicated by an arc (direct) in the Delaunay graph. As in the mono-dimensional case the fusion of two zones is carried out only if the condition (12) is satisfied, that is to say that if this fusion leads to a reduction in the probability of independence of the attributes S and T.
- the discretization provides connected regions - each region being in fact a connected union of Voronoi cells. Each region groups together individuals who are statistically homogeneous with regard to the target attribute and, conversely, two distinct regions have distinct behavior with regard to this attribute.
- the independence probability value obtained at the end of the discretization makes it possible to compare the pairs (generally the n-tuples) of continuous attributes and to classify them according to their predictive value of a target attribute.
- the multidimensional discretization method still applies to a symbolic multidimensional attribute, that is to say to an attribute are symbolic attributes.
- a symbolic multidimensional attribute that is to say to an attribute are symbolic attributes.
- FIG. 6 represents a population of individuals of a database projected on the plane defined by two continuous numeric attributes.
- the target attribute is the class of individuals who can take the “class 1” modality represented by a diamond or the “class 2” modality represented by a point.
- Fig. 7 shows the associated Delaunay diagram.
- the discretization method as exposed above leads to four zones, indicated in Fig. 8 by different gray levels. These related zones are formed by the fusion of Voronoi cells, each containing an individual from the initial population.
- the discretization makes it possible to visualize the behavior of the couple of numerical attributes with respect to the target attribute. In the example shown, a spiral dependence relationship will be observed between the pair of attributes and the target attribute.
- the contingency table is actually the following:
- zones 1 and 2 are overwhelmingly made up of class 2 individuals, while zone 3 is essentially made up of class 1 individuals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20020735548 EP1389325A2 (fr) | 2001-05-23 | 2002-05-21 | Procede de discretisation d'attributs d'une base de donnees |
US10/478,880 US20040158548A1 (en) | 2001-05-23 | 2002-05-21 | Method for dicretizing attributes of a database |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR01/07006 | 2001-05-23 | ||
FR0107006A FR2825168A1 (fr) | 2001-05-23 | 2001-05-23 | Procede de discretisation d'attributs d'une base de donnees |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002095620A2 true WO2002095620A2 (fr) | 2002-11-28 |
WO2002095620A3 WO2002095620A3 (fr) | 2003-03-06 |
Family
ID=8863733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2002/001711 WO2002095620A2 (fr) | 2001-05-23 | 2002-05-21 | Procede de discretisation d'attributs d'une base de donnees |
Country Status (4)
Country | Link |
---|---|
US (1) | US20040158548A1 (fr) |
EP (1) | EP1389325A2 (fr) |
FR (1) | FR2825168A1 (fr) |
WO (1) | WO2002095620A2 (fr) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2849249A1 (fr) * | 2002-12-19 | 2004-06-25 | France Telecom | Methode de discretisation/groupage d'un attribut source ou d'un groupe attributs source d'une base de donnees |
US7644083B1 (en) * | 2004-09-30 | 2010-01-05 | Teradata Us, Inc. | Efficiently performing inequality joins |
US8135667B2 (en) * | 2009-12-31 | 2012-03-13 | Teradata Us, Inc. | System, method, and computer-readable medium that facilitate in-database analytics with supervised data discretization |
US11314826B2 (en) | 2014-05-23 | 2022-04-26 | Samsung Electronics Co., Ltd. | Method for searching and device thereof |
US9990433B2 (en) | 2014-05-23 | 2018-06-05 | Samsung Electronics Co., Ltd. | Method for searching and device thereof |
TWI684936B (zh) * | 2017-04-19 | 2020-02-11 | 鎮裕貿易股份有限公司 | 具商業模式功能之售後系統平台 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0877010A (ja) * | 1994-09-07 | 1996-03-22 | Hitachi Ltd | データ分析方法および装置 |
JP2001519070A (ja) * | 1997-03-24 | 2001-10-16 | クイーンズ ユニバーシティー アット キングストン | 一致検出の方法、製品および装置 |
US6192360B1 (en) * | 1998-06-23 | 2001-02-20 | Microsoft Corporation | Methods and apparatus for classifying text and for building a text classifier |
US6742003B2 (en) * | 2001-04-30 | 2004-05-25 | Microsoft Corporation | Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications |
-
2001
- 2001-05-23 FR FR0107006A patent/FR2825168A1/fr not_active Withdrawn
-
2002
- 2002-05-21 US US10/478,880 patent/US20040158548A1/en not_active Abandoned
- 2002-05-21 WO PCT/FR2002/001711 patent/WO2002095620A2/fr not_active Application Discontinuation
- 2002-05-21 EP EP20020735548 patent/EP1389325A2/fr not_active Withdrawn
Non-Patent Citations (4)
Title |
---|
BAY S D: "Multivariate discretization for set mining" KNOWLEDGE AND INFORMATION SYSTEMS, NOV. 2001, SPRINGER-VERLAG, UK, vol. 3, no. 4, pages 491-512, XP002204468 ISSN: 0219-1377 * |
DOUGHERTY J ET AL: "Supervised and unsupervised discretization of continuous features" MACHINE LEARNING. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING, PROCEEDINGS OF 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING, TAHOE CITY, CA, USA, 9-12 JULY 1995, pages 194-202, XP002204467 1995, San Francisco, CA, USA, Morgan Kaufmann Publishers, USA * |
HUAN LIU ET AL: "Chi2: feature selection and discretization of numeric attributes" PROCEEDINGS. SEVENTH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (CAT. NO.95CB35878), PROCEEDINGS OF 7TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, HERNDON, VA, USA, 5-8 NOV. 1995, pages 388-391, XP002204465 1995, Los Alamitos, CA, USA, IEEE Comput. Soc. Press, USA ISBN: 0-8186-7312-5 * |
KERBER R: "ChiMerge: discretization of numeric attributes" AAAI-92. PROCEEDINGS TENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, SAN JOSE, CA, USA, 12-16 JULY 1992, pages 123-128, XP002204466 1992, Menlo Park, CA, USA, AAAI Press, USA * |
Also Published As
Publication number | Publication date |
---|---|
FR2825168A1 (fr) | 2002-11-29 |
EP1389325A2 (fr) | 2004-02-18 |
WO2002095620A3 (fr) | 2003-03-06 |
US20040158548A1 (en) | 2004-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lopes et al. | Dynamic recommendation system using web usage mining for e-commerce users | |
US20190286752A1 (en) | Efficient convolutional network for recommender systems | |
Alexander et al. | Task-driven comparison of topic models | |
EP1483693B1 (fr) | Representation informatique d'une structure de donnees arborescente et methodes de codage/decodage associees | |
CN109325182B (zh) | 基于会话的信息推送方法、装置、计算机设备及存储介质 | |
EP2356493B1 (fr) | Procede de modelisation geologique de donnees sismiques par correlation de traces | |
CN108763496B (zh) | 一种基于网格和密度的动静态数据融合客户分类方法 | |
US11232153B2 (en) | Providing query recommendations | |
WO2007059033A1 (fr) | Procede et dispositif permettant d'identifier des donnees presentant un interet dans une base de donnees | |
WO2019129977A1 (fr) | Detection d'anomalies par une approche combinant apprentissage supervise et non-supervise | |
CN119128177A (zh) | 基于用户需求的化塑产品推荐方法及系统 | |
EP1746521A1 (fr) | Procédé de classement d'un ensemble de documents électroniques du type pouvant contenir des liens hypertextes vers d'autres documents électroniques | |
Fahim | A clustering algorithm based on local density of points | |
WO2006008350A1 (fr) | Recherche automatique de similarite entre images incluant une intervention humaine | |
EP1389325A2 (fr) | Procede de discretisation d'attributs d'une base de donnees | |
EP1984873A1 (fr) | Procede et dispositif d'aide a la construction d'une arborescence de groupe de documents electroniques | |
EP1912170A1 (fr) | Dispositif informatique de corrélation propagative | |
CN108921431A (zh) | 政企客户聚类方法及装置 | |
EP3622445B1 (fr) | Procede, mise en oeuvre par ordinateur, de recherche de regles d'association dans une base de donnees | |
EP1431880A1 (fr) | Méthode de discrétisation/groupage d'un attribut source ou d'un groupe attributs source d'une base de données | |
CN111814059B (zh) | 基于网络表示学习和社团结构的矩阵分解推荐方法及系统 | |
CN110197056B (zh) | 关系网络和关联身份识别方法、装置、设备和存储介质 | |
Miller | Toward a personal recommender system | |
US20250045301A1 (en) | Systems and methods for classifying substructure in graph data | |
Yadhati et al. | Movie Recommender System with Visualized Embeddings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2002735548 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10478880 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2002735548 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2002735548 Country of ref document: EP |