Background technique
With the development of data mining technology, more and more target clustering techniques are applied in class prediction, common
Application scenarios such as image dividing processing, biomedical identification, educational resource classification etc..By taking educational resource is classified as an example, root
According to various features of educational resource: such as type (video, text, exercise) uses duration (the average time used of resource
Length), frequency of use (number that resource is used in certain term) etc. can cluster out several different types of educational resources,
Its result can provide suggestion from application angle for the exploitation of educational resource.Further, with student information data Cooperative Analysis,
The exploitation of educational resource can be made more targeted.
The main purpose of target cluster is similar Target Assignment into a cluster, so that the target phase in the same cluster
It is high as far as possible like spending, and the target similarity in different clusters is low as far as possible.In traditional clustering method, each target can only belong to
In a cluster, such methods belong to hard clustering method.However as going deep into for application, hard clustering method encounters some problem,
One of them is exactly the uncertain border issue between cluster and cluster, i.e. some targets may be between multiple clusters, this just exceeds
The solution range of hard clustering method, and such issues that soft cluster is specific to.
Most important one kind technical solution is in soft cluster, using rough set (Rough Sets) or it is similar theoretical to cluster into
Then row modeling models target using fuzzy set (Fuzzy Sets) or similar theory, modeling will be finally completed
Cluster and target substitute into the frame of traditional k-means clustering algorithm.
The problem of two aspects are still highlighted in this kind of soft clustering method.On the one hand, the modeling of cluster is used a variety of similar
Theory, in addition to rough set, there are also shade collection (shadowed sets) etc., these theories are that a cluster is regarded as three domains:
One domain being made of the target for absolutely belonging to the cluster, a domain being made of the target that may belong to the cluster, one is
The domain that target by being unlikely to belong to the cluster forms.And present invention applicant has found, these theories have internal uniformity,
It can be summarized with three decision theories, but current soft clustering method does not use three decision theories to build cluster
Mould;On the other hand, when calculating cluster center, different weights is applied to the target in not same area, and these weights are roots
It is determined according to experience, such consequence is that cluster center is very sensitive to weighted value.Currently, the two aspects are urgent need to resolve
The problem of.
Summary of the invention
In view of the drawbacks of the prior art, technical purpose of the invention is the provision of a kind of target clustering method, uses three
Branch decision theory models cluster, more efficiently can carry out clustering to target.
In order to realize the technology of the present invention purpose, present invention employs following technical solutions:
A kind of target clustering method based on three c-means decisions, by a cluster ciBe modeled as the domain positive,
The domain boundary and the domain negtive, are expressed as POS (ci)、BND(ci) and NEG (ci);Wherein, the positive of a cluster
Domain is made of the target for absolutely belonging to the cluster, and the domain boundary of a cluster is made of the target that may belong to the cluster, a cluster
The domain negtive be made of the target for being unlikely to belong to the cluster;
This method comprises the following steps:
(1) by target data x to be clusteredjIt is initially allocated to the domain positive of k cluster at random, wherein xj∈ U, U are
The set of all target data compositions to be clustered;
(2) central point of k cluster is calculated;
(3) according to calculated each central point, redistribute all target datas to k cluster not same area;
(4) it checks whether stopping criterion for iteration meets, (2) step is returned to if being unsatisfactory for, otherwise, terminate;
The step (3) redistributes all target datas to the specific implementation process of each cluster are as follows:
Define relation function r (ci,xj)=μij, μijIndicate target xjWith cluster ciThe fuzzy member value of similarity degree;
Opening relationships vector [r (c1,xj),r(c2,xj),…,r(ck,xj)]T=[μ1j,μ2j,…,μkj]T, indicate target xj
With the similarity degree of each cluster;
Defined feature functionTable
Show the maximum value for extracting relation vector;
Define relativeness functionTarget x is describedjWith cluster ciRelatively
The relativeness value of other clusters, the value is bigger to illustrate target xjWith cluster ciRelationship it is closer, value range be (0,1];
The opposite ownership set of definitionTarget x is describedjThe gathering that may belong to
It closes;Wherein tmj,tnjIt is [t respectivelyij], maximum value and Second Largest Value in 1≤i≤k;It should
Cluster in set is target xjThe cluster that may belong to, if only one cluster of the set, target xjThe cluster will be assigned to
The domain positive, if the set there are two or the above cluster, target xjThe domain boundary of these clusters will be assigned to;
Establish evaluation functionTarget x is describedjWith cluster ciRelativeness value;α=1 is set,Then have the Clustering Model based on evaluation as follows:
A kind of target clustering system based on three c-means decisions, by a cluster ciBe modeled as the domain positive,
The domain boundary and the domain negtive, are expressed as POS (ci)、BND(ci) and NEG (ci);Wherein, the positive of a cluster
Domain is made of the target for absolutely belonging to the cluster, and the domain boundary of a cluster is made of the target that may belong to the cluster, a cluster
The domain negtive be made of the target for being unlikely to belong to the cluster;
The system includes the following modules:
Original allocation module, for by target data x to be clusteredjIt is initially allocated to the domain positive of k cluster at random,
Wherein, xj∈ U, U are the set of all target data compositions to be clustered;
Center point calculation module, for calculating the central point of k cluster;
Distribution module is updated, for according to calculated each central point, redistributing all target datas to k cluster
Not same area;
Iteration ends determination module returns to central point meter for checking whether stopping criterion for iteration meets if being unsatisfactory for
Module is calculated, otherwise, is terminated;
The update distribution module redistributes all target datas to the specific implementation process of each cluster are as follows:
Define relation function r (ci, xj)=μij, μijIndicate target xjWith cluster ciThe fuzzy member value of similarity degree;
Opening relationships vector [r (c1, xj), r (c2, xj) ..., r (ck, xj)]T=[μ1j, μ2j..., μkj]T, indicate target xj
With the similarity degree of each cluster;
Defined feature functionTable
Show the maximum value for extracting relation vector;
Define relativeness functionTarget x is describedjWith cluster ciRelatively
The relativeness value of other clusters, the value is bigger to illustrate target xjWith cluster ciRelationship it is closer, value range be (0,1];
The opposite ownership set of definitionTarget x is describedjThe gathering that may belong to
It closes;Wherein tmj, tnjIt is [t respectivelyij], maximum value and Second Largest Value in 1≤i≤k;It should
Cluster in set is target xjThe cluster that may belong to, if only one cluster of the set, target xjThe cluster will be assigned to
The domain positive, if the set there are two or the above cluster, target xjThe domain boundary of these clusters will be assigned to;
Establish evaluation functionTarget x is describedjWith cluster ciRelativeness value;α=1 is set,Then have the Clustering Model based on evaluation as follows:
Further, the calculation formula of the central point of the cluster is as follows:
Wherein, meaniIndicate cluster ciCentral point;POS(ci) indicate cluster ciThe domain positive, | POS (ci) | indicating should
The number of target in the domain cluster positive;BND(ci) indicate cluster ciThe domain boundary, | BND (ci) | indicate cluster boundary
The number of target, w in domainijIndicate target xjFor cluster ciWeight.
Further, the target xjFor cluster ciWeightμij∈Mxj, wherein μijIndicate target xj
With cluster ciThe fuzzy member value of similarity degree, MxjIndicate characterization target xjWith the fuzzy member value collection of affiliated cluster similarity degree
It closes.
Further, the characterization target xjWith cluster ciThe calculation method of the fuzzy member value of similarity degree are as follows:
Wherein, μijIndicate characterization target xjWith cluster ciThe fuzzy member value of similarity degree, the number of k expression cluster, 1≤i≤
K, 1≤j≤n, n are the target numbers in data set;dij, dljRespectively indicate target xjTo cluster ciWith cluster clEuclidean distance, ginseng
Number m > 1.
Further, three domains of the same cluster meet following condition:
Three domains of different clusters meet following condition:
Compared with existing clustering method, the target clustering method of the present invention based on three c-means algorithms.This
One cluster is modeled as by invention towards each boundary cluster uncertain problem common in practical clustering problem
The domain positive and the domain boundary, it is any simply by the presence of the applicable this method of the indefinite problem of cluster boundary, applicable surface
Extensively, Clustering Effect is good.
Further, when calculating cluster center the Upper approxiamtion according to belonging to target (domain positive and
The domain boundary) quantity determine its weight, rather than use experience weight can more efficiently carry out cluster point to target
Analysis.
With the application of the invention, clustering effectively can be carried out to various educational data collection, it is poly- especially suitable for student performance
The fields such as class, education resource cluster.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
Specific implementation step of the invention is described in further detail below with reference to Fig. 1.
Step 1. inputs a D to be clustered and ties up educational resource data set, clusters number k, cutoff threshold ξ.
Step 2. initialization generates a random number for each dataThat is r be 1 and k it
Between natural number.According to this random number r, achievement data is assigned to some cluster ciThe domain positive.
Step 3. calculates the central point of each educational resource cluster.
Calculate the fuzzy member value of each data.According to formula (1), each data can be calculated relative to each poly-
The fuzzy member value of class, as shown in the table.
Calculate the domain positive or boundary which cluster is each data belong to.To any data xjCalculate itFor example, xjIn c1, c3, c4In upper approxima-tion Deng three cluster, then gather
Data are found out with respect to the fuzzy member value that these are clustered.To any data xjCalculate itFor example,Then have
The value is normalized, w is calculatedij。
Using normalized value as the mean of each cluster of weight calculation.
Step 4. redistributes data to each cluster according to the mean of each cluster.
Define relation function r (ci, xj)=μij。
Defined feature function
Define relativeness function
Define an opposite ownership set
Establish evaluation functionAchievement data is distributed to different clusters.
POS(ci)={ xj∈U|v(ci, xj)≥1};
Step 5. checks termination condition.The step (5) checks the specific implementation process of termination condition are as follows: record changes every time
The mean of each cluster in generation, decision algorithm is restrained if the difference of the mean of each cluster with previous iteration is less than pre- cutoff threshold ξ;Or
Algorithm iteration 100 times;Above-mentioned termination condition meets first, then algorithm enters step 6, otherwise return step 3.
Step 6. exports the domain positive and the domain boundary of each cluster.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.